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PREFACE TO VOLUME 3 


i 


This is the final volume of our treatise. It has taken longer to write than we a 
hoped. To some extent this has been due to our involvement with other work, but 
it is also attributable to the amount of development which has been going on in recent 
^ars in the subjects dealt with in this volume, which are the analysis of variance, 
he design of experiments, sample survey theory, multivariate analysis, and time-series, 
t becomes increasingly difficult to know what is permanent and what is ephemeral 
n the spate of current research. In deciding what to omit and what to admit, there 
iave been occasions when we have been reminded of what Thackeray said about 
dacaulay, that he read a book to write a sentence. 

As with the first two volumes, this one is self-contained in three respects: it lists 
ts own references, it contains such Appendix Tables as are necessary to follow the 
ext, and it has its own index. Now that the Kendall-Doig Bibliography of Statistical 
iterature is available, a comprehensive bibliography is unnecessary. As before, 
xtensive sets of exercises are provided at the ends of chapters. 

We have again to thank Mr. E. V. Burke of Charles Griffin and Company Limited 
|or the care he has given to the production of this work. 

We are also grateful to many reviewers and correspondents who have commented on 
rrors, misprints and obscurities in the first two volumes, and shall be equally glad to 
.e notified of any that may be found in this final volume. 

M. G. K. 
A. S. 


)NDON 


\ugust, ig66 
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CHAPTER 35 

ANALYSIS OF VARIANCE IN THE LINEAR MODEL: 

CLASSIFIED DATA 


35.1 In devSlfiping the MV unbiassed linear estimation properties of the LS 
estimator (19.12, Vol. 2) in the linear model y = X6 + e at (19.8), we observed at (19.42) 
that the sum of squares (SS) of the observations may be written identically as the sum 
of two non-negative components 

y'y = (y-xe)'(y-x@)+(xe)'(x@) (35.1) 

of which the first is the sum of squared residuals (Residual SS) from the model fitted 
by LS. The second component on the right of (35.1) is the reduction in the SS due 
to the fitted model; the greater this reduction is (i.e. the smaller the Residual SS is), 
the more satisfactorily the fitted model represents the y-X relationship in the observa¬ 
tions. If we rewrite (35.1) as 

y' y = (y - X0)' (y - X@)+§' (X' X)0 (35.2) 

and recall from (19.16) that 

V(6) = a 2 (X'X) -1 (35.3) 

we see that if the error-vector c in the model is normally distributed, B, being a linear 
function of e, will by 15.4 be multinormally distributed with mean 0 and dispersion 
matrix (35.3). The last term on the right of (35.2) is therefore the exponent of this 
multinormal distribution apart from the factor - (2a 2 )- 1 , and by 24.6, 0' X' X@/a 2 is dis- 
. tributed in the non-central x 2 form (24.18) with degrees of freedom v = k and non¬ 
central parameter 


X = 0'V" 1 ^)© = O ' X ' XO / a 2 . 

For brevity we write this distribution x ' 2 O', A) as in 24.5. 


(35.4) 


This result enables us to test the hypothesis that 0 = 0 O , and in particular 


35.2 
to test 

#o:B = 0, ( 35 . 5 ) 

for then2 defined by (35.4) is zero, and the distribution becomes a (central) y 2 with k degrees 
of freedom Jd.fr.). As we saw in 19.11-12, (y-X6)' (y-X8)/a* is a with („ - k) 
d.fr and y y/«,■ is a with n d.fr. when (35.5) holds, Cochran’s theorem of 15.16 

fc^afteVZr/bTS.)" ^ ^ ° f (35 ' 2) ^P^enUy distributed. 


F = {«'X'Xe/A}/{(y-X6)'(y-X6)/(„-A)} (35.6) 

has the variance-ratio /’-distribution with (k, n-k)A.fr. when (35.5) holds 

It we wish to investigate the power of a test of (35.5) based on (35.6), we require its 
distnbuuon when (35.5) does not hold. In order to show that it is a non-central F as 





vamced theory of statistics 
THE AD „<! denominator of (35.6) remain inde- 

. I24 ,05), we must prove that <he'the^more general hypothesis that 9 = 6 0 * 0 , 
^Indent when 9 ri ». ?*«*£££, a test at all. Thus we need an extension 
we require Msfstnbu^m norma i variables, i.e. normal variables 

of Cochran’s theorem (1W») 10 

with means not all equal. . . 

Stem” the right of (35.2) can then be further separated into 

(xey(xe) = i cjl ( 35 - 7 ) 

i=l 

The elements c u are positive, since X' X is a positive definite (non-singular) matrix. 
(35.7) expresses the reduction in the SS as the sum of k parts, one corresponding to 
each parameter. Here again, we may be interested in testing hypotheses concerning 
individual 0 i( and require the distribution of the components c u 0\ when 0^0. 

If XX is diagonal, so is (35.3), and the linear model is called orthogonal since the 
are uncorrelated, and actually independent when e is normal. We have already 
discussed orthogonal models in the context of regression theory in 28.15-20, Vol. 2, 
where we were concerned with the use of orthogonal polynomials. (28.72) defined 
(and Example 28.3 illustrated) the procedure of evaluating the reduction in the SS 

ue to each further parameter, using an entirely intuitive justification. Our present 
discussion, will be more general. 

Analysis of variance 

FisherV Ae Sr^fthe COnCept ’ ori g inaII Y developed by R. A. 

components, each of which corresponds^ 6 C ? reSSe ^ f the sum of non-negative 
model, we call this an analysis of variance (AWonv' lit ° f ^ Hnear 

•0 caD it an analysis of SS but histnrv ,.ik ■ n y ' ^ wou W be more appropriate 
AY ia intend as a s’epl^u^” 1 ** logical usage.)^ 
Parameters upon the observations v T h, f ^ ° f the different Subsets of 

"* ° f ""**» - Avlf ce S ESTS* ^ 

——A-S - "— s-pj~ih,i,i., VKloio , 

and 'tat %) = p, V(y) _ 


( 35 . 8 ) 

( 35 . 9 ) 


e . 




analysis of variance in the linear model 
we saw there that any one of the three conditions 

(a) | r . = the rank of Q, 

(b) each A, & idempotent, i.e. A, = A?, 

jlz&Z Vrr -,'5 

St, tz e &-» 

that (c) is equivalent to 

(c') each Qi is independent of every other. 

However, the equivalence of (b) and the statement 
(b°) each ft is a (central) z * variable with r, d.fr„ 
denended upon the result of 15.11 that if p = 0, ft is a variable if and only if At 
is idempotent. It is thus (b°) which requires to be generalized through a generaliz 
tion of 15.11 to {a # 0 . 

35.6 The only essential change brought about in 15.11 is that the canonically 
transformed variable y\ in (15.43) is now a %' 2 (1, /4) variable, by 24.4. The c.g.f. of 

2 a { y\ is therefore not (15.45), but the more general form obtained from Exercise 24.1, 
which yields for the cumulants of Q 

k s = 2 S - 1 (*-1)! s fll(l + ^?) (35.10) 

1=1 

(the generalization of (15.46)), and also shows that the cumulants of a %' 2 (v y X) variable 
are 

k 8 = 2*- l (*-l)l(v+*A). (35.11) 

For (35.10) and (35.11) to be identical, we must have 


S a\ = v, 


i =1 


f all 


5. 


(35.12) 


E alfil = X, 

i=l 

(35.12) is satisfied if and only if every a t = 1, so that v = r and A = E Since 

the are the non-zero latent roots of A, it follows that A is idempotent! We thus 
see that, for general p, the statement equivalent to (b) above is 

(b') each Q i is a x' z ( r ah) variable, 

reducing to (6”) when p = 0. Moreover, if we transform orthogonally back from the 
canonical to the original variables, we see at once that X { = p'A,p, and S X { = p' p, 

the non-central parameters of the ft adding to that of Q (tf. Exercise 241) ’ 

We have thus reached the conclusion that if q q\ u u ' ’ 


4 the ADVANCED T » E0RY ° F 9 * is re placed by Q = y'Ay, 

i* f as 6 is unaffected if> in V * L,, argument justifying this m 

35.7 The result of 35.6 is un^^^^ The argum 

where A is any ^empo en f or general p. have a non-singular 

zrsrst Si*— -— 

multinormal distribute wtth ispers.on 

is only slightly changed. For = ^ V (y) = V, (35 ' 13) 

and k y n (35.14) 

Q = y'Ay = 2 y 

we may write V = TT' and the transformation y = Tz produces independent normal j 
variables z, since the exponent of the multinormal distn u ion 

y'V-iy = z' T(T') - 1 T _ 1 Tz = z'z. 

We then have, from (35.14), 

O = z'T'ATz = S z'T'AfTz = £ 

i=l t—1 

and these are the quadratic forms with which we now deal. Condition (b) of 35.5 
is now 

T'A^T = T'A^T'A-T 
or A,V = A*VA<V 

so that A*V must now be idempotent, as must AV. Condition (c) is 

TA i T.T'A J T = 0 

or A,VA 3 . = 0, i*j. 1 

Condition (a) is unaffected by orthogonal transformation. 

We may therefore finally state the general results 
If y is non-singularly multinormal with moments H5 
(35.14) holds for a quadratic form O where AvTw +^ ^ decomposition, 

™^here , is rhe rardr o f A> £d 

( a ) = r. * 1 

fl is inde — ^ - y other . j 

4) simplifies , heir proofs . •" — more ge„e ral results than ^ 1 

forms m norni 

in ^ 


m 


i 
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Similarly, we now see that for any individual d i in the orthogonal model of 35.3, the 
ratio 

F = cJi/{(y-X§)'(y-X§)/(n-k)} (35.15) 

is a F — /a 2 ) variable, and may then be used to test hypotheses concerning 0 i . 

More generally, for a hypothesis H 0 imposing r^k constraints, the ratio of the SS 
due to H 0 and the Residual SS, multiplied by (n-k)/r, is a F'(r,n-k,X) variable— 
cf. 24.29-31. It should be particularly noted from 24.29 that the non-central para¬ 
meter X is always of exactly the same form as the numerator SS of the test statistic 
with each observation replaced by its expectation, and a 2 as a divisor. Thus we may 

always obtain X very simply from the numerator SS in the test statistic by substituting 
6 for 0 and dividing by a 2 . 

These are examples of the LR test in the linear model, derived generally through 
the canonical form of the model in 24.25-9. The discussion below (24.100) explicitly 
pointed out that the LR test of a linear hypothesis concerning any subset of r of the k 
parameters is based upon the reduction in the SS due to these r parameters divided 
by the Residual SS. The canonical approach of Chapter 24 had its theoretical uses 
in the derivation of optimum properties of LR tests in 24.36-7. For our present 

purposes, the equivalent partitioning of SS which we have been discussing is more 
direct and informative. 

We remind the reader that exact and approximate expressions for the power function 
of the LR F-tests are given in 24.32-3. 


AV for classified observations 

35.9 Our definition of AV in 35.4 applies to any linear model, and covers the 
applications to regression theory in 28.12-23. However, the term AV is commonly 
used m a narrower sense, in which it was originally developed. 

We saw in 35.4 that AV is used to separate out the influences of different parameters 
upon y. In experimental work, the parameters are often the effects of certain “ treat- 
ments ” upon the variable y. For example, in agricultural experimentation, from 
which this terminology derives, y might be the yield of wheat from a plot of fixed size 
and the “ treatment ” being investigated might be the addition of a certain fertilizer 
to the plot during jhe growing season. Naturally, the experiment would include both 
treated and untreated plots. The point here is that such an experiment may be brought 
within the scop'enf the general linear model by defining a “ label ” variable * which is 
equal to 1 when the treatment is given and 0 otherwise. 

It is easy to see that any pattern of treatments can be handled in this way; we need 
only define a label variable x for each possible ingredient of the treatments in the 
experiment. If there are two fertilizers in the example of the previous paragraph we 
should define x t as the label variable for the first and * 2 as the label variable for the 
second fertilizer. Thus, a plot which receives both fertilizers has %&= *, = 1- one 
which receives only the first has * x = 1, = 0; a plot which recess only the second 

er i lzer has x t = 0, # 2 = 1; and a plot receiving no fertilizer has x t = x 2 = 0. The 
analysis of the linear model can now prefeeed without difficulty, since the elements of X 
may be any known constants. 





statistics 

tbe advanced theory 0 , n 35 „ is t ^ at all 

6 TB ■ V in the exa®P* es °f . ore sence or absence 

3510 The feature of the "ffA iie merely fc belsf ° se nse, the term AV 

to elements are units or zeros, S OTatments „ p the nar^ ho lds true for 

of certain ingredients m ^ & Unear mo del whe perin itted in X in this 

is used to describe the anty^ ^ integers aw ^ ^ discussed at 

sU the elements of X. 0^ k the single-fer P thers a double dose, 

Te'S of 35 . 9 ,' some plots ^ * = 2,1 or 0 accordingly; 

tdolersJneatallofthefertrlmer. W " However> this formulation suffers 

if *.»nr 

presence or absence of a single dose, and * a to denote presence or absence of a dou e 
dose of the fertilizer. This alternative formulation does not (as the reader may be 
tempted to think) reduce the model to the two-fertilizer model at the end of 35.9, for 
we cannot now have x x = x a = 1 for any plot—there is evidently some loss of symmetry 
to offset the avoidance of the implication of linearity in dose-effect. 

35.U We shall be discussing the formulation of linear models in several important 
AV Situations, and we shall see that the simple (usually 0-1) structure of the elements 
ofx produces corresponding simplifications in the analysis itself The simnl,^ 

IS that of a classification of observations into groups susoeldaiflf • T? C!>Se 

tins ,S usually known as a one-way classification^ ' P d d ^ m their means 1 

■? “ me - wa y classification 

., 1 - a rr of —- - sified 

’ ’ ' observations in th^ uu _ _ * 


differ only i n their 


y .“;.7' we may express thisas 


group and £ n . 




n. 


wt0 k groups, with 
If the groups can 




1 . 2 , . . . 
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arU 


X = 

(n x k) 


1 

i 


>n x rows 


>n<L rows 


>tis rows 


rn k rows 


The zero elements of X are omitted. We see at once that 


n 0 


XX- 



so that the analysis is orthogonal (cf. 35.3) whatever the values of the in particular, 
they need not be equal. Also 1 F 




X'v = 


% 


2 Q 


so the LS estimator of 0 is 


x ? 






* 4 * 


e = (X'X)-ix'y = [ y *), 

\yk.t 

wherey ( . = S^/% is the mean(*hof the observations in the ith group. The estimator 
is m accordance, with intuition since the observations are independent. We have 


h 

I 4I 


X8 = 


yi. 
y 2 . 

y 2 . 


V; 


\n x rows 


>n 2 rows 


>n k rows 




to 




summation as we did for frequencies and probabilities i 


• averaging, and not 

m Chapter 33. 





OF 








, r v ^ ■ 

A pVA I,CE v*b°* e * t 

■ d m -(*«) <-» 


(35.16) 



(3cey 


ain ts > 


& - T ft o, ^ 

Resi<J ual S t S « 2 „ s »,+ " < * 

OS}> 

,, c use /-ff* 


. 1 ), v .)»- 

2 


(35.17) 

(35.18) 


we 


Sb ’ ,,toa( central)/r(A: 
■ducing 10 ' 


(35.19) 

F(M-*) variable 


dis 


t * s« i ef/ ffS ) variab ’ • most practical situa- 

WribowW*^*'*' ’ < . cipa l interest " ith out specifying 

.othesis of P r l are all e 9 ua p0S ite hypothesis 
*t whether th t he c ° m P /^e on\ 


if Hi h° lds - (3S 18 ) is not the h W°*“ he ther the ft are a , he uum r - 

Ho*e«' r 'l” „iiv wish to te st , . therefore test (05.ZU) 

lions, where we « • ? d of (35.18), = h _ 0, , ions are 

.heir common -•• ^ the n observ 

lv (fe _l) constraints. I 1 

which ^P 0 ® 68 .^^ with common mean 
identically distributed / % /A 

* „ . [ «a/ w 1 e. 

fl. s s = : 

t-=1 \n k /n/ 


The LS estimator of ft is then the overall sample mean 

1 1> ttt X VI 

8 =v =- S = 


If 1 is a (nxl) vector 
(singular) form 


0, = X. 

of units, we may rewrite the 


ni= u-i n i .... 

linear model temporarily in the 


y = X0+e = 10, + 


X-l 



0+e 


ft,Namely “ ^ inV ° 1Ve<t ^ h ^ 0thesis 

^ - • *>,e) to give the 


^ °;j) denned at 17 \ 

' ' as before. 


(35.21) 




c< 


a) 


• 0 


1 

r; 


is 

G 

r< 


A 

b 


s 

a 

t: 

r 


< a 
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35.8 now gives for the test statistic 

J7 — ( n ~~ k \ ^2 

U-l/Sn 

which is distributed as a F'(h-\,n-k, £ *(8,-(.)*/«>) variable, reducing 

central *YA -1,» - *) when H 2 holds. 

For computational purposes, S 2 and S R are usually written as 


(35.22) 


to a 


. * (M.te)',! 


i— 1 


S ;i = SS 

i t 




(35.23) 


' 


and the results assembled in a table: 


AV table for a one-way classification 


Variation 

| 

D.fr. 

SS 

i 

Mean square (MS) = SS/d. fr. 

Between groups 

k-1 


S t /{k- 1) 

Within groups 

71 —k 

Sr 

S R /(n-k) 


71 — 1 

S 2 + Sr = y' y-nyl 


General mean 

1 

S.-S^nyt 

nyl 

Total 

71 

S 1 + Sr = y' y 

1 


(35.24) 


The “ General mean ” row of (35.24) is generally omitted as of no interest; the variance- 
ratio test based on the ratio of «y 2 . to S R /(n-k) is, of course, the ordinary “ Student’s ” 
t 2 test for the mean, i.e. it has a F(l,n-k) distribution when 0 = 0. The test (35.22) 
is simply the ratio of the “ Between groups ” MS to the “ Within groups ” MS, while 
(35.19) is obtained by adding together the “ Between groups ” and “ General mean ” 
rows of the table and taking the ratio of the resulting MS to the “ Within groups ” MS. 


AV identities and their geometrical interpretations 

35.12 The general theory of the linear model has been used in Example 35.1, 
but the final result can be less formally derived as follows. The identity 

k ni h ui k 


s s (y«-y.)*= S E (35.25) 

i=l q = l i=l (7=1 i = l 

splits the SS of the observations about their overall mean into a SS “ within groups ” 
and a SS “ between groups ” (i.e. between group means). If it can be verified that 
the two sums on the right of (35.25) are independently distributed in the %' 2 form, the 
ratio of the second to the first is an intuitively acceptable criterion for testing the 
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IV * -- 

equality of the group means in the population. Tins ®PP^* 'fcula/tes^st^ 
before, but it offers no direct justification for the choice of this particular test statist^ 

for which the general theory is necessary. In more complicated situations, the approach 
through algebraic identities like (35*25) is often much simp er an er _ ^ an the 

direct use of linear model theory, but care is necessary in splitting the ultimately, 

safety lies only in checking with the general theory. 

35.13 The Pythagorean form of (35.25) has the virtue of drawing attention to a 
geometrical interpretation of the algebraic partitioning of the SS which is the essence 
of AV. We saw in Example 11.7 that the simpler identity (for a single group of 
observations) 

= ^{yi-yf + ny 2 (35.26) 

luT e,rl T Uy equivalent t0 Parting the point y = (y u y t , y n ) in the n-dimen- 
• p "! p e s P ace np°n the equiangular vector, which it meets at (y, y, . y) and 

h h e r m h in thereSUltm r? ri ® ht - an ^ d In ihe more gerTa, 

which we have been using in Example 35.1 and in (35.25), (35.26) it 

. . (?> ,1/* = (y^-yy+ny*. , (35.27) 

from the Total SSUn (35 ^Horive'the^ sp * ltt !" g ”°f F of the “ general mean '* rovv 

above (35.16), and Zn.(y, - v « bein<r , b „ J X ® defi "ed 

1 squared distance from Yft +u 

W t^S^r ^ th — ^ is _ ^ 

problem in hand. ° ngm * "umber of components relevantTi 

35.14 The fact * t? 

i^SSsa : “ s-®"- «-*. 

« •—»*si tsr 

AV a two w. S m ° St ° f th ® P ° ints 

^•e with frequencies 


ii n 


2i n 


n 

n 


lc 

2c 


n. 


n 


n 



n 


r. 


(35,28) 
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Although (35-28) is formally identical with (33.60), our present problem is distinguished 
from those of Chapter 33 by the fact that the value of y is here known for each observa¬ 
tion, whereas there only the frequencies in the cells of the table were known. We 
express this distinction by referring here to a r x c classification as opposed to the term 
categorization used in Chapter 33. We continue the convention of Chapter 33 that a 
dot replacing a suffix to n denotes summation over that suffix, while for the variable y 
we continue the convention set up in Example 35.1 that a dot replacing a suffix denotes 
averaging over that suffix. Together, these two conventions simplify the notation in 
what follows. The reader will see that the grand total frequency in (35.28) should 
strictly be written but we continue to write n instead in this one case to denote 
“ sample size.” 

We may, of course, treat the rc cells in the body of the table (35.28) as a one-way 
classification (Example 35.1) with k = rc. However, the questions which are usually 
asked about the two-way cross-classification (35.28) are: 

(1) Do the means of the row-classification (with frequencies n lm , n 2 ., . . . , n r ) differ? 

(2) Do the means of the column-classification (with frequencies » x , « 2 , • • • , nfi) differ? 

(3) Is there any interrelation between row- and column-means? 

More rarely, we ask also 

(4) Does the mean of the whole set of n observations differ from some hypothetical 
value? 


35.16 Denote the ^>th observation in the zth row and jth column of the table by 
Vito- We then have, in our notational convention, 


riij 


y<i. 2 yijp/ n ij > 

p=i 

C llij 

y *•• "“2 2 y ijp /n it 

j=ip=i 


= 2 n ijyij.Mi. 

j=i 

c c 

= . 2 Wtf./ 2 H%j y 


i= i 


j=i 


r riij 


y-i • = 2 2 yap/*,- = 2 n ijyii Jn 4 

i=i P =i i=i 

r r 

2 W'ijyijJ 2 Mij y 


c mj 


i= 1 
r 


(35.29) 


y*- 2 2 yijp/ n — 2 n i.y<Jn = 2 n ,y , /n 

i=i j=i j)—i /=i j=i J 

An easy way of avoiding any possible confusion in notation is to define a dummy 

variable which is identically equal to 1 for all p = 1 2 n Then H 1 ? ?Q\ 

becomes v _ y,, /y „ ' * ij ' v 33 *^) 

yn. ~ ^ yiwf 2 n ijp , 


Hip/ " f Hjp 
V V 

Tt.. = 22 yijp/ 2 2 n^p , 

3 v j p 

y.j. - ^ ^ yijp/ ^ 

t P ip 


= 2 2 ^w/2 2 , 

t P ip 

t = SS« ( J 

*tp f j , 1 


(35.30) 
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L „_—»»trzzszzz nj»—*- 

hereafter unless otherw.se stated, . » »Way 
1 to c; and p is summed from 1 »>• 

35.17 In formulating the linear model, we require to answer the questions 
of die observations in each cell of the rxr ble. £ or ^ terms of: 
posed in 35.15, however, we express the mean 

u.„ a mean common to all observations; 

£ a mean common to all observations in he h row^ 

fi tj , a mean common to all observations in J • * the ith row 

Since we already have the cell means ft, common to obser^ ^ty lc czn be 

and the jth column, we now have 1 + r c P^ r ’ : ntroc j uce d by our choice 

linearly independent. The singularity which we ave . • _£ 14-15 

of parameters can easily be removed, either by the augmentation'technique of 19.14-lb, 

Vol. 2, or by eliminating the redundant parameters, as we shall do nere. 

Once /i.. is defined, any (r-1) of the means ft. determine the other one; similarly, 
only (c-1) of the means need be considered, since they with /*,, will determine the 
other one. Once the and p tj are thus determined, it is easy to see that only 
(r-l)(c—1) of the p {j can be independently determined (cf. the d.fr. in 33.29). We 
may thus confine ourselves to (r— 1) parameters p it (omitting p rie , say), to (c— 1) para¬ 
meters p tj (omitting p, c , say), and to (r -1) (c -1) parameters p tj (say, i = 1,2, . . . , r— 1 

and j = 1 ) 2 ,..., c— 1). These, with p„, make up the rc parameters required for the 
model to be non-singular. 

It should be noticed that we do not define the parameters ju„, ju mi except to 
state that they are (weighted) means of the p^. 


35.18 We now define 


0 .. = 


ft**) 

°*j = [ (35.31) 

a _J . , ~ ~ (fa* , 

and write the linear model in the form 

yijp ^ o** + 0 *. + o . j o , 

t , F ° r obviou s reasons, 0 M is called th ^ ^ % ^ + (35 32 

he ith. row-effect and the /th ml ^J enera ^ mean, and 0 0 
^ean in a particular row ^, C0 UTnn ~ e ff e ct y measuring the res P ecti vely callei 





13 


analysis of variance in the linear model 

row and /th column “ act additively ” or “ do not interact.” B u as defined in (35.31) 
measures departures from this situation, and is called the interaction between the rth 
row and the /th column. 

35.19 The (r + c+ 1) linear relations between the parameters, discussed in 35.17, 
be written (but we shall return to this subject in 35.26-8 below) 


may now 


(35.33) 


0 = 2 %0*„ = 2 n.jd.j 
i=l j= 1 

r 

= 2 7Ifj Ofjy J = ly 2, . . . , C 1, 

i= 1 
c 

2 Mijdijy 1 = ly 2y • • • y T ly 
J =1 

= 22 n^Oy. 

t=l j=l J 

If, as in 35.17, we define the parameters 0 in (35.32) for i = 1, 2, .. ., r—1 and 
/ = 1, 2, . . . , c— 1 only, the eliminated (r + c+1) parameters may be expressed in 
terms of the others, using (35.33), as 

r—1 >1 

®r* 2 n^d^ t /n ri 

i=l 

K = - 2 n tj O tj /n c , 

i=l 

r—1 

®rj ^ 71 ij 7 = ^ > 2 , . . . , £— 1 , 

t=l 
c—1 

®ic ^ TlijO^j/ilfay i — 1, 2, . . . ? 7*—1 3 

i=i 

o °rc = + 2 2 n^dy/n^. 

1=1 J=1 J 

35.20 We may now write down the matrix X of the linear model (35.32). It is 
not a matrix of units and zeros only, because the expression of the eliminated parameters 
m terms of the others, in (35.34), involves various ratios of the n’s. 

To simplify the reader’s verification of the elements of the matrix in (35.35) its 
columns are headed by the parameters to which they correspond and its rows are bor- 
dered by the frequencies in the cells to which they apply. Only non-zero elements of 
X are shown Throughout the matrix, a vector of units 1 contains a number of com- 

ponents equal to the sum of the cell-frequencies (in the border of the rows) over which 
the vector 1 physically extends in (35.35). n 


(35.34) 
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(35.35) by its » ian ^’ <.-i> F'” 1 '' 1 ’ 

0' 


W 


tb 


(r-D 


X'X = 

(rcXff) 


n 


0 


! O' 


O' 




D 1 


0 




((-i) 




0 


D' ! B I 




0 » ! 0 I 


(35.36) 


Th< 

{hi 


^jg^j^rSSSSa 

given by 


L£ 

of 


A 

(r-l)x(r-l) 


, „? n,n, 

'«- + i ~«r 

w| tt 2 .«3. Ihjh^u. 

n 2 .+zf n ’ n r 

n Tm n r. 


(35.37) 


w 


r, 


, «? -i,. / 

^•• + 'V7 


V 

L 

t 


/ . A 2 i W .1 M .3 

’M.in--- 

M.c »e n -c 


W.iW, c - i\ 

n- 


B = 

(c-l)x(c-l) 


W. 2+~ 


n: 2 n. 2 w. 3 W. 2 W.C -1 


«e H. c 




(35.38) 


7 I .c-l +-/ 

w .c / 

and T »htn s 1 ht7h^ ( ^'” 1)(C ri ) ? trixC more ^complicated. If we label its rows 
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„ 4 . „,. ,| 

C(W), (mff) \Wfcc n rl n reJ 

/in nk-m, l¥“l\ (35.39) 

= n kl n kq n J 

/ 1 , 1 \ if k^m, 1= q> 

= n kl n ml (II 'n~) 

\ n rl n rc / j 


= w /cf%g 


; ^ ^ w, i ± q-i 

= D of or der {r- 1) x (c- 1), whose 

The only remaining non-null matrix in (35.3 ) 


— Mij (1 


(35.40) 


(*j)th element is n .A , % «.j /gs-gdV ( 35 ’ 40) 

D v ~ Hii \ n c nJ n r . \ n c n o) 

r , rXat nc 36 ) can only be inverted numerically as in the g e ^ e f^ 

35.21 In general, X X at (35.36) can omy 0 ^ matrlx W iU be 

LS procedure, but inspection reveals that if we can maK 


i V'V at nS 36") can only be inverted numerical y 
35.21 . n g?" eral - X _ X i!‘ ( i 5 ii C fu °"/we can make D = #, the matnx WiU be 


of the form 


XX = 


(35.41) 


whose inverse is simply 


(X'X)" 1 = 


(35.42) 


We are therefore led to examine the conditions under which D = 0, i.e. every element 
D {j defined by (35.40) is zero. The structure of D {j makes it evident that this will 
be so if and only if 


n ij _ 7 hc ; = \ o r -1 • i - 1 2 r 1 

n , i, r i, j - i, z, . . . , c 1, 


and also 


^rj _ ^rc 


n .j n c 

These conditions are simply that every cell-frequency n tj be proportional to its column- 

total frequency n It follows that every n i} must then also be proportional to its 
row-total frequency and that we must have proportional to its 

, n ij = n i. n.j/n, all i,j\ /« ao\ 


THE 


n->urrvRY OF STATISTICS 

advanced theory o* 


The proportional-frequencies case implies that the LS estimator of 

35 22 We first observe that the form of (35- ) V Qther parameters, and 

rhetoral mean 0.. is orthogonal ^“"effects are estimated orthogonal!* 
similarly that the (r-1) hnearly independe independent column-effects and 

I-*- « ssssss 

r,::rr,.-.'-.•'■•'‘■'"•'''"'"‘•■i 

group of parameters in which we are intereste . . ^ . . eva l ua ted the LS 

8 The reader will, perhaps, have observed that we have not ye J f 

estimators themselves. The reason for this is that even when ’P»P“*™ 11 * 
quencies condition (35.43) holds, the elements of C given at (35.39) are not such as to 
make its inversion simple, although of course we may evaluate C numerica y y 
given situation. Fortunately, however, we may use the orthogonalities referre o m 
the preceding paragraph to obtain the LS estimators of the row- and column-effects 
at once, and use them later to evaluate the LS estimators of the interactions. To do 
this, we need only invert A and B at (35.37-8), and use (35.42) to evaluate the first 
l+(r—l) + (c —1) = r + c-1 rows of the (rex 1) LS estimator vector. 

35.23 It is easily verified by matrix multiplication that the inverse of (35.37) is 

/1 1 1 1 1 \ 


J. _1 

« 2 . n 


A-i = 


and similarly that (35.38) has inverse 

/I 1 1 


(35.44) 


n 


J__l 

«a ri 


B-i = 


(35.45) 


-- ' i-i / 

,hey are pred ^SSr in Cha ^ « - - 

6 ,and0m enables, where 
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We no^ require only the first r + c—1 rows of the (rc x 1) vector X'y. From (35.35), 
these af« seen to be, In the notation of (35.29), 

My... \ 

/ n x.{yi..-y r .) \ 

k / n*.(y2..-yr..) \ 


( X 'y)r+c-l = 


»r-i..(y r -i,.-y r ..) 
n Ay.i.-y.c.) 
n* (y. 2 .-y.c) 


(35.46) 


\ n .c-i(y.c-i,.-y.c) ! 

Using (35.42) and (35.44-6) we find, for the first (r + c—1) components of the LS 
estimator (X'X) -1 X'y, 


@r-1, * 

0.1 


y... 

yi..-y... 


\ Xi.-y... 


(35.47) 


w*c-!/ \y.c-i,.-y.../ 

Thus the LS estimator of the general mean is the overall sample mean, and the LS 
estimators of row- (or column-) effects are the sample differences between row- (or 
column-) means and the overall sample mean. It follows at once fro m (35.47) and 
the first two linear relationships in (35.34) that the same holds true for the eliminated 
(redundant) row- and column-effects, i.e. that 

K* = y r ..-y...> K = y. c .-y... • (35.48) 


35.24 Substituting (35.47-8) into the definition of the interactions B« in (35.311 
we see that 

= h ~yt. -y.j. +y... > (35.49) 

since % is a linear function of the other quantities (cf. 19.6, Vol. 2). Now, clearly, 
from the extreme right of (35.32), we must have the LS estimator 

H'ij = y%j. 5 

and thus (35.49) becomes 

^ = yy. -yt. -y.j.+y... . (35.50) 

Thus all the parameters are estimated, in this proportional-frequencies case by the 
“ obvious ” intuitive estimators. ’ J 

Now that the LS estimators of all the parameters in our model are known we may 

proceed, m Example 35.2, to test the various hypotheses corresponding to the questions 
asked in 35.15. n 


Example 35.2 Two-way cross-classification with proportional frequencies 

The results of our investigations so far show that the linear model (35.32) for the 


20 


two-way 


0F STATISTICS 

THE AD' w ' 1 "'" ., in the nott- sin S ulat f (35.51) 

r . • repr esentab i _ a ds the rows of 

i-dassific 3 " 0 " is y = X9+6 ;„ r w hose transpose 


jV aNC eD 


rJ'JiEOR^ 


(35 35), and • ’• the ^ "" „ as ldog for a 

•S „>*** (35.52) 

the hyp°* esls Ht'-h. = 0i ',' ', 2) in 35.15 is su®l* y 

. „ ,) constraints. (**» < > (35.53) 

“Constraint hypothec . . - »«-1 = °> 

( c ; Hi : “*i v 2 , -,h ^««traint hypvu*— (35.54) 


Ml 


A : “ 9 , 1 l)-con"straint hypothesis 
(3) in 35.15 to the (r-l)(r I 1( . .e-1- . 

»>-£ ^l/iponis t: tie single-constraint hypothests 

*"* question (4) » 35.1 ^^ = we must find 

four hypotheses ( 35 . 52 - 5 ) are compos^ To test ^on ^ we have 

L .i_:u„.oKU tn that hypothesis, and use tne g i ^„ T ^ n them account 


between thenr account 

ave to partition the SS due to the fitte mo e particularly straightforward 

rr.“£'“r.—- * 

irthogonal sets of estimators. In fact, X X was given at (35. )• 

We now write the LS estimators from (35.47) and (35.50) in the form 

ly \ l@* 1 


0 = 


y r -i,..-y... 
y.i.-y... 

y.c-i.-y... 

. 

yii-~yi..-y.i. +y m 
^ c ~ h ■ •• ~y. c-i,. 4 y,,. 


0 




6-3 


(35.56) 


i * A > n—A,. /r-l,,.-J >c _ 1 +y / \ y / 

where the subvectors of 0 have 1 v i '••••/ \ / 

From (35.56) and H5 4 n c-1 and (r-1W C _ n 

* ’ ’ bave ^ le ^^composition m P°ncnts rcspcctiv 

e'Y'va An ~. 


0'Y'Ya ^ r wc>A UvJll 

where A, B, C are th u = + ^ K B0 . + rfi 

the right of (35 571 - C su ^ ma trices of (35 in j o * J l3 ’’ n 

(35.57) ls the SS attributable ti. 3t ( 3 5-37-9) Th c ( 

U p e t0 K which we Jit' Thc first tc " 

S, = n/ . e wr,tc explicitly as 
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37} may be written in the form , 

' A = +n r n n . v / _1} x 1 

u n is a (r - 1) X (r — 1 ) diagonal matrix with elements % and n r is 
where D„, is a (r ijxjr , s • t,, nf ,« 57 Hs now seen to be 

vector with elements The second term on the right (35. ) 

S t = K {D Wt + n r n;.}0i, 


i=l 1^=1 n r. 

r , v o (35.59) 

= £n i Xy i -y..y. K 

This is the SS attributable to H v In an exactly similar way, using B at (35.38), we 
find for the third term on the right of (35.57) 

s 2 = in.i(y.}-y...) 2 , (35,60) 

the SS attributable to ff„ Finally, we'find from (35.39) that the last term on the 
right of (35.57) js v. 

Sj = s x »ij(y<i.-yi..-y.i.+y~y+— i. s . . S n v(y«- y*~ Yj.+>0| 
i=1 1 ^ rC ^=1 ' 

) 


r —1 1 fc-1 
+ 2 —■ *s 2 

i=l (J= 1 

c-i i fr-i 

+ s — 1 2 ^(^.-^..-rj-.+X.. 

j=l7l r j \^i— 1 


>1 

= £ in ii (y (l -y ( .-y. i .+y..y, ( 35 . 61 ) 

1=1 j= 1 

upon use of linear relationships among the estimated interactions precisely analogous 
to those for the interaction parameters given in (35.34). The four SS defined in 
(35.58-61) exhaust the SS due to the fitted model (35.51). The only other quantity 
we shall require is the Residual SS, which here, as generally, is the difference 

S B = y'y-Q'X'XB 

= SZSjfo -(£ 1 + S, + S 3 + £ 4 ). 


i j p 


( 35 . 62 ) 


For computational purposes, the other SS are written in the forms 

5, = (S£S^)V». 

51 = 2 {2 -( s s s >’«)>) V", 

i j p i j p 

5 2 = h {(2 2 y^pY/ti j) - (2 S ^y^pY/v, 

j i V i j V 

5 3 = S S P^)V«J -X {(2X^)7*J 

i 3 p i j p 

- 2 P X y^) 7«.,} + (X X X y tfp )V„, 

0 ~ 

which the reader should verify. 


2- p 


* 3 V 


( 35 . 63 ) 


0n substituting from ( _ S S {(S W ^ ' 

S « = n- 1 * Results of our analysis in a table; 

ow as w . did in Example 35.1, assemble the res 

/e may now, as wc 

whnrtional frequencies 


AV ,abU for a tm-way -- 

Non-central 
parameter A 

- - 

D.fr. 

ss 

MS 

Variation 


_ 

--- 

— 

Si 

Si/(r-1) 

s «i.0h/^ 2 

i 

Between rows 




Between columns 

c — 1 

s 2 

S,/(c-l) 

S n.j e* /<r 2 
i 

Interactions 

(r-l)(c-l) 

s» 

S 8 /(r-l)(c-l) 

L S mj Ojj/a* 

i j 

Residual 

n — rc 

Sr 

Sn/(n-rc) 



n — 1 




General mean 

1 

Si 

Si 

nOU/cr 2 

Total 

n 

y'y 



a' 

ft 

C 

d 

c 

z 

I 

1 

1 

1 


(35.65) 


The general theory of 35.8 tells us that the LR test of any of the hypotheses H x to £f 4 
is obtained by using the ratio of the corresponding MS in (35.65) to the Residual MS 4 
and rejecting the hypothesis for large values of the ratio. Each of these ratios is a 
non-central IF variate with d.fr. as given in the table and non-central parameter (obtained 

te'sVr?, ru f le in a 35 - 8) g r in the last cok ™ 

To test the comprehensive hypothesis ' ' test ) 

tf o :0 = 0 

for all the parameters (which means tW IT u Tr , (35.66) 

‘ells us that the ratio to be used is ” " H * and H < a11 hold), the same theory 

F = i S l ± +S 3 +) / rr 

which is a + . (35.67) 

the —p- 

in Which the cell-faqS rUle ' Th is test ^xactlv tif^’ ^ Substit Ming 6 for 6 
S (35 67r iCd - Similar ly. to tesnh as a one.^ c i° n % mentioned in 35 * 15 > 

Wlied t„ , t S heT' a cdlf by tHe teS ‘ 

cell-frequencies. /l l )> this test beinn. .’ tlle numerator 

8 ec inivalent to (35.22) 
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The equal-frequencies (balanced) case f r „ nll , n cies situation (35.43) 

35.25 The most important case of the P™Portio i f J of the computing 

arises when all cell-frequencies n % are equal, say to«• Jlmant matrix c 0 f 

Lm" ifn’, («—rc). Since all the tests given ^Example ^ then become 

having to estimate rc parameters from the same number of observation . 
nrisingly, we can do this exactly, with no residual variation—we are m just the sam 
position as we should be in fitting a polynomial of degree q-1 (requiring ? constants) 
to a set of q observations. Thus we can estimate all rc parameters even when m - l, 
but only at the expense of seeing our Residual SS disappear. f 

There is no way out of this difficulty unless we consent to reduce the number ot 
parameters in the model, and what we shall in fact do is to discard the (r— )(c ) 

Interaction parameters 6 ip leaving ourselves with r + c-l parameters to be estimated. 
We shall then have a new Residual SS to replace (35.64), and in fact this will be seen 
in Example 35.3 to be precisely the former Interaction SS, S s , defined at (35.61). 

It should not need to be stressed that this restricted model, without interaction 
parameters, is unsuitable for the analysis of data where interactions do exist. For 
this reason it is inadvisable to restrict ourselves voluntarily to one observation per cell 
of a cross-classification unless we are sure that rows and columns do not interact. 
However, considerations of cost or time sometimes enforce such a restriction. 


Example S5.S Two-way cross-classification with exactly one observation per cell 

If the interaction parameters 0^ are dropped from the linear model, we now have, 
with one observation per cell, 

Jij = 0 ** + Oi* + 6*j + Eij, 

where, to avoid singularities, we define for i = 1, 2, . . . , r— 1 and for 
j = 1,2, . . . , c- 1, as previously. All the work of 35.17-19 in respect of our present 
parameters holds good. The matrix X defined at (35.35) remains valid if we use only 
its first (r + c-1 ) columns, as does the leading (r + c-l)x(r + c— 1) submatrix of 
X X at (35.36), in which we now still have D = 0 since the proportional-frequencies 
condition (35.43) holds here. A and B at (35.37-8) and their inverses at (35.44-5) 
are unaffected, as are the vectors (35.46-7), which are now complete instead of partial 
vectors for the LS estimators of our parameters. We may therefore test the hypotheses 
H lf H 2 and H i at (35.52-3) and (35.55) exactly as in Example 35.2, the only difference 
being that what was previously the Interaction SS, S 3 , now becomes the new Residual 
SS, for the four SS in the following abbreviated table must add to y'y, as always. 
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(35.68) 


Tk « * »■ ». -* * “ ” “ “ W “ 1 ' *" MS ^ orJ „ 

to obtain a Residual SS, we c appropriate component. 

by separating off from that Residual SS an appropnaie v 
Consider the linear form 

h ■=■ zhtj c^j uijy 

where 0- is defined by (35.50) (the final suffix to the y’s is now redundant) and the c fj 
are coefficients to be determined. If the interactions 6 {j are all zero, but in genera not 
otherwise, E(L) = 0, and it is thus intuitively reasonable to use a statistic or the torm L 
to test the hypothesis of zero interactions. If we choose the so that S Cy = S = 0, 

we see from (35.50) that 


and hence 


L — IjH c {i y {i , 

t j 


var L = C 2 a\ 

where C 2 = S S c% and a 2 is the error variance as usual. Thus L 2 /(C 2 a 2 ) is a y 2 
variable with 1 d.fr. when the interact! on q i\/r 

SS at (35.61) is S 3 = llZr J T’ ° Ur preSent Res!dual 

v ) 1 and [L /(C a 2 )} is independent of L 2 /(C 2 a *\ 

since the 6 {j can be orthogonally transformed . r ' 

normal variates of which one is I/Co, and S,- (LV«?l«U at ?.? dard ‘ zed '"dependent 

of the others, distributed as f with (r — 1) ( c _ 1 , . ' W1 be ttle su m of squares 

It remains to choose the c.. Thevcn I f - 1 = rc ~ r ~c d.fr, 

independently of the 6 by ° f * hC §i *’ sin « the lattci 

will be as given above. 7 f C ° nd,t,onal distributions for fiJd t d f nbut,or 

A simple choice is c-- n n , ‘ U{ " * j) whlcl " 

M, ; so that we may define 

1 ^ j °vJ^)7(2 0? 2 0;j) 


(35.69) 
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ANALYSIS OF VARIANCE IN THE LINEAR MODEL 

St/<r * is a x z variable with 1 d.fr. and (S t -Sd/o* independently a * 2 variable with 
(rc-r-c) d.fr. Their ratio Si/(S,-Si) = F has the variance-ratio distribution with 
n rc —r—c) d.fr., and may be used to test the hypothesis that all interactions are 
zero. This test for complete additivity of effects was suggested by Tukey (1949), who 
generalized it further—see Scheffe (1959). M. N. Ghosh and Sharma (1963) studie 
its power against the alternative that there are interactions of form 0 i} = ctdi, 0*#- For 
f the 6x6 classification, the power was found to be of the same order as the F-test for 
interactions obtained by equating adjacent pairs of the 0 it and of the 6 tj . 

Choice of weights 

35.26 We must now discuss a point which we deliberately passed over in formu¬ 
lating our linear model in 35.17-19. We observed there that we had (r + c+1) para¬ 
meters in our original model which were redundant in the sense that they were linearly 
dependent upon the rc other parameters, and we therefore eliminated them using the 
set of linear relations given in (35.33), leading to (35.34), which determined the structure 
of the basic matrix (35.35). It is now necessary to recognize that the set of relations 
given in (35.33) is essentially arbitrary—in the first relation given there, for example, 
we chose to equate to zero the particular linear function £ 6 it , using as weights the 

i 

marginal row frequencies n i% . This may seem natural, but it is by no means necessary: 
we might have chosen instead to use equal weights, so that £ 6 it = 0, or indeed any 

i 

weights w i} so that £ w i 0 it = 0. 

t 

If the complete set of n observations were a simple random sample from some popula¬ 
tion, the observed %/n would be estimates of the population relative frequencies in 
* the row categories, and it would therefore be meaningful to define the row-effects 
using these weights to express their linear dependence. Similarly, £w. j 0, ; = 0, and 

i 

the other relations in (35.33) would be meaningful in the same context. We call these 
the frequency weights. 

35.27 However, in many experimental contexts there is no question of the observa¬ 
tions being a random sample from some population—the rxc cross-classification is 
deliberately set up to throw light on the variable (y) being studied. The use of observed 
frequencies as weights in the linear relations (35.33) is then no longer readily interpret- 
able. It may even be meaningless to consider any set of weights as the “ right ” ones, 
in the sense of reflecting an underlying population distribution; for example, if we 
have a 2 x 3 cross-classification to study the effects of two different doses of Fertilizer A 
and three different doses of Fertilizer B on the yield of a crop (y), one may be simply 
interested in the effects and interactions as such, and not as representing any population 
at all. There is a crucial distinction here between the “ experimental ” and the 
“survey” approach to data, to which we shall revert in Chapters 38-9. 

In experimental investigations, therefore, it is common (for lack of any known 
appropriate system of weights) to use equal weights throughout (35.33). For the 
remainder of this chapter, the equal-weights system means that (35.33) holds with all 
symbols n u n p suppressed, i.e, replaced by l’s. It is to be observed that, whereas 
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35.28 The choice ofJ e « h ‘ s “ ( and column-effects an ^ they will be so for 

the parameters, gc"«"all un der any ^“ g ^ first weighting system, 

if the true interactions M* » , s . For under tne 

(35.31) shows that «„ 0 q ^. 

This is of the form 

Of = di + bj + c, 

and it is evident from the definition of interactions in 35.18 that if they were represent¬ 
able as the sum of row-, column- and general components, these would be absorbed 
into the row-effects, column-effects and the general mean respectively, leaving the inter¬ 
action equal to zero. We thus have Of = Oforal lij. If # 3 of (35.54) holds, therefore, 
it holds for every weighting system. 
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Disproportional frequencies 

condife, hdd^MhaTAe’ShS) The pr °P° rtional «y 

simplified analysis of 35.22-5 is no longer valid if (35 ' 36) ‘ S non - n u», and the 
genera, case that (35.36) may be parSed'l ^ ‘™ e » this most 

fA D\ j 

doira^'fo/ 3 ^^ Was tbe special case D^= ^ ^ 

£ =&r=f (A —)- r ofthis - tiiib — 


XX = 



(35.70) 
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roay be verified by multiplication, so that^ 

(X'X)" 1 = 


(35.71) 


1 ) leading diagonal submatrix of (35.42) is now p 
If we write (35.46) concisely as 



■ach v being the subvector of (35.46) with number of rows indicated by its suffix, 

ve may generalize (35.47), using (35.70-1), to 

'--- 1 0\ /nyj 

\ r 
I l 

0 E / 


((X'X) -1 X'y) r+c _i = 


(35.72) 


((B —D'A _ 1 D) - 1 (v c _i —D'A -1 v,—i). 

'hus the estimators 0 £ ., B tj are numerically determinable, while L = y... ^ways, 
j is intuitively obvious. As in 35.24, the definition of the interactions at ( 55 . 31 ) 
len implies that their estimators satisfy 

r\ 


hi = yu.-®i* -> (35.73) 

that the LS estimators of all the parameters are determined. The generalization 
the decomposition (35.57) is 


«'X'X8 = n«. + (®'-Y(* 7)(7WV 
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Example 35.4 Two-way cross-classification with disproportional frequencies and frequency 
weights 

(35.74) shows at once that H s and # 4 of (35.54-5) can each be tested in the manner 
of Example 35.2, although the SS attributable to the interactions must now be numeri¬ 
cally evaluated from the last term on the right of (35.74). Thus both the general mean 
and the interactions MS can be tested (with 1 and (r-l)(c-l) d.fr. respectively) 

against the Residual MS, since they are non-central F variables, irrespective of the 
row- and column-effects 5*1 

The SS attributable to the row- and column-effects jointly is the middle term on the 


in (35 33f thi d « * eqUal Weights were used instead of frequency weights 

h,.f-7 u u SS attnbutable to interactions would not be a separate component in ns 74\ 

cuiumn-effects jus’. as ‘mangS 
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35 30 Use of the equal-weights system instead U ^ ^ ^ the frequency 
Jtnd of column-effects computationally a good deal s®p 

weights used in Example 35.4. We may Pro^^^ as if it wer e a one-way 
Suppose that we first analyse the rxc £ or Residual, the 
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classifications. Using Example 35.1, we find the AV ta e e ow 

AV for any rxc cross-classification 


Variation 

D.fr. 

ss 


Due to 

classification 
as a whole 

rc 

S S na yij, 
i j 


Residual 

n — rc 

2 2 I. {yij j, — yi 
i j p 

i.Y 

Total 

n 

2 2 2 yljp 
i j p 
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„ If we use equal weights instead of the frequency weights in (35.33), the two 
say. ™ n i __ A Uoirp cimnl V 


ciaV It we use equal wugmo vx.v ----1 - j - o ' . < 

summations on the right of (35.77) are each equal to zero and we have simply 

Ji = 0». + 0i* + £i- (35.78) 


(35 78) is the one-way classification model (Example 35.1) with a single observation 
in each group, except that the error variances are not equal. If we define %i = yd^i. > 
we f^e E(z i ) = (6„ + 0^)/Vj /2 > and the conditions of Example 35.1 are otherwise 
satisfied. The effect of this on the analysis is to replace S 2 defined at (35.21), which 


in our present application would be E — - S y^j , by the same sum with each term 
given the coefficient Vi 1 


i > 


i.e. 


where 


s 2 = i (l, nij 1 /c 2 \ {Si-y) 2 

i=i \j=i / 




is the weighted mean of the y { , using Vi 1 as weights. 
We therefore have an AV table as follows: 


AV for rxc cross-classification using equal weights 


Variation 

D.fr. 

SS 

Due to rows 

r—1 

S Vr\yi-yf 

Due to remainder 
of classification 

r(c — 1) + 1 

i= 1 

[Obtainable as a difference] 


rc 


Residual 

n — rc 

As in (35.75) 

Total 

n 

L S S yijp 
i 3 P 


(35.79) 


An exactly analogous breakdown of the “ classification” SS can be made for tb 
columns-classification. We therefore have tests of row- and of column'ffects in th' 
general case. Rows ” and “ Columns ” SS cannot be added because of their ™ 
orthogond'ty, so that we cannot obtain the Interactions SS by differencing. Howeve" 
2 , a test for interactions is easily derived by this method—cf. Exercise 35.5 

AV 3 nf?Qt Th<! T al ' Wei 5 ht . S s y stem . whose use permitted the development of th 

Themis 0 ZZTJf ‘‘“VS ‘ h ° Ugh they Were ind ™ d ^ ote“n 
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the corresponding interaction % is non z , , £ V S „ /« in iind the linear 

from other cells in the table. It follows from the definitions (35.31) and the Unear 

relations (35.33) that none of the 0 a , 0 f „ or 9.. can be estimated in the general case 

if there are one or more empty cells in the cross-classification. However, even in this 

case we can estimate the error variance quite easily. If we denote the number of cells 

containing observations by [re], we obtain the more general form of (35.75): 


AV for any rxc cross-classification (#) 
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ANALYSIS OF VARIANCE IN THE UNEAR MODEL 

- , i r 17 tft -6 below, we shall be giving 

are — mUsing - 
5 

^“^-classification which we have treated at length in 35.15-35 
3 *X only interesting generalization of the one-way classification in Example 35.1. 
Cole thatwS eachof the k groups there considered, there is a further one-way 
classification of the observations. The »r observations in the first group 1 

sub-groups, with frequencies w u , « 12 , . . . , n Ul where 2 ^= n x \ the second group 

similarly has 4 sub-groups, with frequencies ft 2i , w 22 > • • • > w *ii> ^ w *» = w 2 > an so 

on until in the Mi group there are l h sub-groups with frequencies n kl , n k2 , • • • > n ki k > 

2 n kh = n k . It will accord better with our notational conventions if we now replace 

the original group frequencies n t of Example 35.1 by , to denote summation of the 
sub-group frequencies n ih within the original groups. Thus we have 


2 n ih = ft; 


li=l 


This is a two-way hierarchical classification^ of the observations, the separate sub¬ 
grouping within each of the original groups contrasting with the common row-grouping 
of every column category in a two-way cross-classification. 


Example 35.5 AV in a two-way hierarchical classification 

In Example 35.1 we have already defined k parameters, one for each group. In 
order to investigate variation in the means d ih of the 4 sub-groups within the ith group, 
we use only 4~ 1 linearly independent parameters, for we may put 

2 n ih 6 ih = 0 


h=l 


(cf. 35.17-19 for the cross-classification) so that, (t) as at (35.34), 


l u-i 

= ~ S n uAh- 

n ii t h =i 


(35.81) 


We 


and 


may now generalize the linear model in Example 35.1. We write / = S /. 
y illp for the pth observation in the ftth sub-group of the ith group. We have 


that*th!re kin n .™ Ve i term nested 'classification ” appears to be more easily taken to imply 

use it despite it, ™ T’** °- sub ' groups m each ori 8 inal group, and we therefore do no 
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The (nxk) X submatrix in (35.82) is that used in Example 35.1. Each 
submatrices X is of the form F 


X = 

(«l.x (/,-!)) 
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wh lch follows at once front (35.81). (35.82-3) now give 
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(35.84) 


is of the same form as (35.37) and therefore has the inverse, of the same form as (35.44), 
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(35.85) 
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Hence, from (35.84-5), the LS estimators are 

jy,. 
y 2.. 


8 = (X'X^X'y = 


and thus the SS due to the fitted model 


yk.. 

yn-yi'. 

yu-yx.. 

yi,h-i,-yi. 

yn-y 2 .. 

W 1,-x,-y k .., 


(35.86) 
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he firs, term on the right of (35.87) is precisely S, defined at (35.16) in Example 
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(35.88) 
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E ^ e l’of the "Between sub-groups whhin groups" MS to the Re_sidual MS is. 


The ratio ol the” between S uu-giuu F ow^e,^-r M _A and non- 

from our general theory, a non-central invariable with d.fr. {l-k, » ana 
central paLieter X = S £ tt«0L which is zero when all sub-group means within 

each group are equal, giving a central E-test for this hypothesis. 


35.37 The hierarchical process can clearly be carried further, with sub-sub-groups 
and even sub-sub-sub-groups. These would be termed three-way and four-way 
hierarchical classifications, and are relatively rare in practice. It should be obvious 
to the reader that there is no need to go through again the rather tedious algebra of 
LS theory to obtain the results we need here; the work of Ex am nit- « c oc , & i, 
spur the SS within each of the * groups into two component w th ( - ) and (n ’X 
d.fr respectively, and summed corresponding comoonent, L. n ’ d ^ ~ l i) 

the (/-A) and (n-l) d.fr. in the table (35.88). TheTme In itt **7* l ° 0btain 
now be carried out within each sub-group, and so on Th P d ?" off P rocess can 
the three-way AV in Exercise 35.3. Scheffe (1959) o' he , 1Cader ls asked to verify 
three-way case. ( 959 ) gives theoretical details for the 
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SS due to 

D.fr. 

General mean 

Rows 

Columns 

Interactions 

Residual 

. a \ Row groups ^ } 

1 1 /Row sub-groups 1 R 

c — 1 

n iv \ \\Interactions with groups (k - l)(c -1) 

1 A C “ DJinteractions with sub-groups (l-K)[c—i) 

n — lc 

Total 

n 


At the second stage, each of the SS involving the hierarchical (row) classification 
is subdivided into two parts as indicated on the right. The first of these subdivisions 
is a direct application of Example 35.5 (it being remembered that the general mean 
component has here already been removed from the first line of (35.88) by the cross¬ 
classification analysis), but the simplest way of achieving both subdivisions is to merge 
A all sub-groups within the groups of the hierarchical classification and recalculate the 
SS for Rows and Interactions using the merged data—these are the required component 
SS, with (&-1) and (&-l)(c-l) d.fr. respectively. The sub-groups SS are then 
obtained as differences if the analysis is orthogonal. 

Scheffe (1959) gives theoretical details for the case where there is the same number 
of sub-groups in each group of the hierarchical classification and the same number 
of observations in each cell of the Ixc table. 


35.39 Suppose now that, instead of embedding a two-way hierarchical classification 
within a two-way cross-classification as in 35.38, we carry out a new one-way classifica¬ 
tion within each cell of a cross-classification. If the same one-way classification is 
carried out in each cell, we clearly arrive at a three-way cross-classification. All the 
problems of formulating the linear model, discussed in 35.15-19 for the two-way case 
now arise afresh, and some generalization of our concepts is required, as we shall now 
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35.41 We write tire linear model +0 w +^+0®+ £ »’ 

- *»+ £ «‘» s •- +6i " ' ’* , the observations in the (*, j, *)* cell, 

the generalization of (35.32). * c mt “ 0 . t h e row-effects 0»,„ cclumn- 
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effects and layer-effects 0,.*, th. row defined exactly 

interactions «*, and the column-layer in every 

as in 35.17-18 for the two-way cross-classification, w th an extra ^ 

suffix The last set of interaction parameters on the right of (35.89), the S,j k , 
by extending the argument of 35.18. If the deviation of fi m from the general mean 
6,„ = /!„, were exactly equal to the sum of the three corresponding main effects and 
the three corresponding interactions already defined, we should have 
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(35*89), however, contains 1 +r+e+l+rc+r i +c l +n l We 

border interaction par^rs to (, 1) + ( ,_ 1) (/-l) 

l + (r-l) + (c-l) + (* l) + ( y K ) } reader may verify by 

parameters excluding thenumber of parameters to the 
addition that ot tnese Dnng this i s the number 

the other parameters are known. 

«43 Nothing but the heavy algebra now prevents our following throug 
detail'an analysis parallel to that already carried out in the two-way estimat . 

numerical difficulty in any given case about fitting the linear model ( • )» 

tab rcl parameters and carrying'out the AV. Even in the two-way case, however 
worthwhile simplifications in the algebra only occurred when the frequency in^eaeh 
cell was proportional to the product of marginal totals as required by (35.43). Similar 
orthogonality conditions now require that each cell frequency should be proportional 
to the product of all the corresponding marginal frequencies (cf. Seber (1964a)). This 
proportionality condition, in practice, is satisfied with equal frequencies in all the cells, 
i.e. in the balanced case. 

The general principles of the analysis are then very simply set out. We saw in 35.39 
£ that we may regard a three-way ( rxcxl ) cross-classification as having been generated 
by imposing a new (/-fold) classification upon every cell of an existing (r x c) cross- 
classification. It follows that the AV can be carried out in two stages, exactly as for 
the “ mixed ” classification of 35.38. First, we consider the (r x c) cross-classification 
as a one-way classification with rc cells, and carry out the AV of the rex l two-way 
cross-classification in which our observations are then displayed. We obtain the 
schematic AV table from (35.65): 


/ first 


SS due to 


General mean 

(r x c) cross-classification 

Layer classification 

Interactions of (r x c) 
cross-classification with 
Z layers 

Residual 


D.fr. 


1 

rc —1 
Z-l 


( rows 
I columns 
I row-column 
(. interactions 


r —1 
c — 1 

(r-l)(c-l) 


/'row-layer 
interactions 

(rc-l)(Z-l)i c olumn-layer 
| interactions 
| row-column-layer 

second-order interactions (r — l)(c — 1)(Z— 1) 

n—rcl 


(r —1)(Z —1) 
(«- !)(/-!) 


(35.92) 
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01 c rgi X°re* lated ’ WC °1s n (with (r~ 1 H C TOrthogonal- remains 

S ! aS The WO remaining SS ^ ana Jf * ° tio „al frequences c 

obtainable * d f^AV in the ? en “f s Pearce (1963) r ^gonal three-way 
The comp utatl ° , e i eC tronic comput f or the non-or g Bradu 
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EaamfU 35.6 Balanced three-way cme-cUwificatim ob vions that 

In the case where all cell-frequences are equalto «> 1« x £ or a (e x ,) x r 

s p s - 92 >- *f* te seen fr °™ tbis s d ymm f ry 

that each of the three sets of main effects and each of the three sets of first-order inter 
actions will have its SS calculated exactly as in a two-way cross-classification table with 
the third factor of classification merged. The Residual SS is clearly also unchanged in 
form. In our present three-way notation, we therefore obtain the following expressions 
from (35.63-4) and Exercise 35.1 for the SS corresponding to the components in 
(35.92). The suffix to S now indicates its d.fr. 


General mean: 

^ = (? ? y mp y/(relm), 

Row-effects: ' 3 p 

Column-effect,: ^ 

S <‘-‘> = f(ffXy l/llp y/(rlm)-S : 


Layer-effects: 




Row-coIum„ intera^ *'^ 


Row-laye r 


«er2;*U??W^)-WS 


1 ) 
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(35.93) cont. 


Column-layer £ (S £ y y/(m) - S,._ « - Vn " S » 

Residual: 

“ ^ways^we^btdifby°subtrartto n the SS: 

Second-order interaction: 


U-UIUVA -- 

S (r -.»-W-« = SSS(S J - Wp )V«-Sfr-i)(.-»- S I'-ixr-i) 

^ i j k P a q C 

— iS(„_i)(J_l) — *S( r _i) —<J(c-l) — (j-l) 1‘ 


(35.94) 


S 

*■ 


-- 

D.fr. 

SS defined by 

Variation due to 

(35.93—+) 

Row-effects 

r -1 

S(r-1) 

Column-effects 

c — 1 

S(c- 1) 

Layer-effects 

l-l 

S(l- 1) 

Row-column interactions 

(r-lXc-D 

S(r—l)(c— 1) 

Row-layer interactions 

(r-l)(Z-l) 

S( r -l)(l-l) 

Column-layer interactions 

(C-IXZ-I) 

S(e- 1)G— 1) 

Row-column-layer interactions 

(r-l)(c-l)(/-l) 

5(r-l)(c-l)(l-l) 


rcl — 1 


General mean 

1 


Classification 

rcl 


Residual 

rcl{m — 1) 

Srcl(m— 1) 

Total 

rclrn = n 

y'y 


(35.95) 


Any of the eight rows of (35.95) forming part of the “ Classification ” SS may be tested 
gainst the Residual SS, just as previously, by the ratio of its SS/d.fr. to the Residual 

tested tr ^ 3 b ~S -ntra, if th“ 


Multi-way cross-classifications 

analysf can T te f“ther S oenl d raH°zrd b , e f'’ *° “? h ° W ,he three - wa y cross-classificatic 
application of the argument we used to^bXlhTdire^ ^ 

more d™ct ; ym ^ etry - ° f (35 ' 95) b the bal “«d case, and also of TdS 93^ Y 
. KM generalEatl0n to classifications. We stu,d 
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tH E ADVANCED atoanSS andth a conse quence 

f the d.fr* attache easily see , . ^ are discussed 

- iform correspondency* repr esents--^ * ., 3 ab „ve, winch 
^dependent P aramet “,, 0 f the type referre and higher-order 

of geometrical argum ^ the definition o * onS tend to be so 

ss3 ; ars.'iis-*•••** 

analysis, their & 

to be tested, 

The combination of AV tests distinct hypotheses obtain 

to each test statistic, and directing 9 y w P. = P as a * 2 variable 

to small values of its transform P* we then ave gizes of the constituent 

with 2A d.fr„ large values of P being «£«• however, we encounter 

tests, we use a size-a test on P. If f t ts 0 f j n 30.36, and this combined 

of the joint distribution 

° f Another simple general approach to the combination of independent tests arises 
from the observation that if the *th test has size a £ , the probability of rejecting at least 

Jc 

one of the hypotheses tested when all are true is simply 1 - II (1 - oq), which reduces 

i=l 

111 ^ _ 


when all oq = a to 


vr iivi l uii y.j - w 

P*(a)= l-(l-a)*, (35.96) 

approximately equal to £oc when a is small, as it normally is in practice. 

35.46 Now if all the k tests in an AV table were independent, we could use (35 96') 
to fa the overall size in testing the set of variance-ratios as a whole, so that if there 
syonr tests to be made at size «, and we required overall size to be 0-05, we should 

0-05 = 1 — (1 — a)* 

srriJ- T; n * v «- *d. 

may be independent of each other! all the tests we hav° Van ° US SS in a table 
» denominator of the test statistic, and the various ^ t USC the Resid «a> SS 

depen e nt ce> e g > t Residua| ■ ^ £ ">u« therefore be statistically 

°f all ate test statistics simultaneously. ^ d “ mee > Iar g e ^ depress the values 
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obscrvcd, with reS ^t G'foVthe f.f.’of> W Let the k values ^ be defined a 
hution functions ana ^ 

solutions for a fixe oc o ^ ^ ^p.} = 1-a, s?/s 2 . 

* .M ■ , ‘ 

K l f S . . , i /) = 1 of the function 

j i /-jc Q 7 \ V) V pm because it is the value a 
We have denoted (35.97) by ^(1) . a (35.98) 


U % ' J J \ / 

p/ 0 ) = f* n [(1 - «) + 0 {Gi (xF { ) - (1 - a)} ] g( x ) dx > 
Jo<=i , . 


a „d we see at once that P( 0) = (1 -«)*• I" order to ex P and ^ in a Tayl ° r Se " eS 
about zero, we investigate its derivatives. We find 

If . v , / V 7 


zero, we investigate no 

rm .if" {G(*F,)-(i-«» n [(i -«)+e{G,(*F,)-(i-«)}]*(*)*, 

v ' i=l Jo P’1 

'l rti 1 . 


so that 


P'(0) = (1 —a) fc_1 2 f * {Gi (xFi) - (1 - «M*) dx = 0 
i J 0 

f G i (xF i )g(x)dx = 1-a. 

J 0 


(35.99) 


Thus the Taylor expansion is 

P(l)-(l-a) fc = iP"(0), O<0<1, 


* \/ \ J z \ /’ ^ ’ 

= iss f"{G ( (*F 4 )-(l-«)}{G,(*F,)-(l-«)} 

*\i=iJ o 

x n [(l-a) + S{G ; (*F,)-(l-a)}M*)rf*. 

l - l 

l^i,j 

very term in square brackets lies between 0 and 1, since a, 6 and G do so, and if 
ssume all these terms equal to 1 we therefore obtain the inequality 

P(l)-(1 -a) ,c <^ 2S f [G { (xF { ) - (1 - a)} {Gj (xFj) — (1 — a)}^(x) dx, 

11^3 *J 0 

om which the Cauchy-Schwarz inequality gives the blunter inequality 

|P(1)-(1-«)*| 

4 W Jo {Gi ( * Fi) “' (1 - a )! W ix } o ” i°i ( xF i ) -(!-«)} 2 g{x) dx 1 S , 

id, even further 


even further, 


a )} 2 g(x) dx \ 


7 

I P(l) - (1 - y.f | < \k{k - 1) max {G t (xF t ) - (1 - a)} 2 g( x ) dx, 

because of (35.99) we may write this 

| P(l)-(l-a)* KP(/«-l) rnax [var (G^P*)}]. (35.100) 
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rriE ORT OF S'Iax-- ( 35 . 100 ) for a 

.pVANCFO bound g ive . Z ero as v in- 

THB computed the It decreases ^ 

Hartley (1955) has . P (J-oc) » “* f the rather ” cre ased to 60, the 

^ in 7t »V b0Und " °'° 5 ’ the bound is 0-0014. Since 

<r?0,*- » »f " \ = 5 , = 60 , max * - 5 - the bound (three 

1 nq to 0-0050. For k » r ; a bly less than the P j u( j e that the 

range ’ especiaHy 
in an AV Rested **£ 
r - (35 - %) in the 

each variance-ratio in the kidi 

form i_ tt = {l-ft(«)} 1/l 

and substitute . for ft W and g) ^ «, ™ (35-101) 


However, the fact that the denominator uu u wnmiuu ^ **** , v -^^ 

may be inefficient to make the tests separately in this way, since the result of any one 


may UC AilCllAwJLGllL lU llldliL LUG LGOLo oupdl CllLIj ill LxJLLO VV ClJ j OilIvv lllw l vOUXl vl ClUJ Ullu 

of the tests gives relevant information for the others. A step-by-step procedure which 
utilizes this fact was suggested by Hartley (1955). 

Define H t as the probability-integral transformation of sf/s z to the uniform distri- 
bution on (0 1) (cf. 1.27) and order the H ( so that H m is the ith smallest of them 
Smce Prob {H,>1 - 5} = 5, a test of size 6 on H { is obtained by comparing it with 


35.50 This *t« n u 

k var,ance -ratios, C corrlspondto" 1 !^ Shown to We size a S 

P zero eff ««s » and (k - c ) t0 n “‘ n , S “ p P° se ‘hat of the 


non-zero effects, and that 


tested with size (l(k) defined from a by 

of all mean squares when all Vi are equal. 

35.49 The result of 35.47-8 states that separate tests of size (35.101) on the k 
variance-ratios with common Residual SS give an overall test of approximate size a. 
However, the fact that the denominator SS is common to all k tests suggests that it 


r 
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xJmmVv 0 T .%f ‘u he c argeSt ° f the ff <’ wi ‘h size l-(l_ B w. = miA 
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hy which time alfl hyP ° th< ^ 
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of the c transformed values H i corresponding to the former among the k H if H(o is 
the largest. If any of the “ zero-effects ” hypotheses are rejected by the test procedure, 
that corresponding to H( t ) must be, since it is reached first in the step-by-step process. 
The probability that any “ zero-effect ” hypothesis is rejected is therefore 
P = Prob [H (i) 1 i = l,l+l ... ,k} 

, < Prob ^ 1 — ^(Z)}. 

> Since l>c, and f3(i) is a decreasing function of i, we have further 

P ^ Prob {H { d>{\- (3(c)} = a, (35.102) 

since if H(i), the largest of a set of c variance-ratios, is tested with size (3(c) we obtain 
an overall test of size a by (35.101). (35.102) shows that the step-by-step test never 

has size exceeding a. If c = k, l = k also and (35.102) becomes an equality. 

Hartley (1955) gives some consideration to the power of the test when all mean 
squares have equal d.fr. 


Multiple comparisons 

35.51 Each of the variance-ratio tests in an AV table tests a hypothesis concerning 
a set of parameters, e.g. the row-effects or the interactions between rows and columns. 
For practical purposes, however, it is often not enough to know, for example, that the 
row-effects 0 *. are different—we need to know which of the 6 it are to be regarded as 

greater than the others, or more generally whether the 6^ may be said to fall into 
distinct groups. 

Now the LS estimators 0* are, of course, the MV unbiassed linear estimators of 
their corresponding parameters, and provide us with estimators of any of the differences 
0 i. 0 ,., but we are usually unable to nominate the differences of interest in advance 
and we therefore are faced with the problem of carrying out a number of non-independent 

IlSn 0n WK dlffer ^ nCeS - The Hussion of 35.45-6 applies here with obvious changes, 

/ iS ca problem we are now concerned with is a more detailed one. Whereas 
in 35.47-5° we were dealing with combining tests on sets of parameters, we are now 
interested m closer examination of a particular set, say the row-effects. 

In a sense, this problem of multiple comparisons, as it is called, is a more complex 
version of the problem of outlying observations, discussed in 32.23-8 Instead of 
being concerned about a location-shift in one or more observations, we are now more 

f rt- y w Whe i her the observations (here the 0„) have expected values (the 

0 it ) which fall into distinct groups. Tukey (1953) reviews the subject. ' 

The LSD test 

35.52 For the sake of definiteness, our discussion will refer to a one-wav classi- 

lITZ „ a f E r mP ‘ e 3 H’ a ^° Ug ? t ete “ n ° eSSentid difWe if ™ consider 

aJthJ t q ff m AV tab e ‘ In Example 35 - 1 > the observed group means y. 
he LS estimators of the k group parameters 0 i (each of which includes the general 

tw tl ! fl a COm T n element )- If the F- test at (35.22) rejects the hypothesis (35.20) 

k 1 are a equa1, We are faced with the need t0 decide which subsets of the 0 . 
y be regarded as homogeneous, and which not. % 

I he simplest test procedure is the oldest (“ Student,” 1908), namely to carry out 
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„ ordinary «T"hf each of »"" are no*®**" ror 0 f the 

pairs of *- rfined test, since 7 ^ted standard 

f the overall ava ilable. j cU lating al ^ c omp alll |g «« c tu J e nt’s ” 

Lmula Uk e (35 W 1 arn0 unts to cai idual SS) ariu f int he‘ btuaem b 
This combined ^ means (using *® sidua lSS) number o> • ^ ^ sam e (equa to 

difference betw tVteapP r0 P^ ia ^ • in each group ^ /9c 2 /TV)** where s 

fSS .& -*t“" *S« *• “Sir«•< “ ■) 

^ . y i n hiassed estimator of the er stan dard error for a consequence, 

Difference (the sPP r0 P““ 1) observed differences^^ in gener al about 

and compare each of A One cannot say prop ortion « of all 

this is sometunes cal ed the roe ans do not differ, V V 

the LSD test than that rf th g P ^ fce wrong ly so judg• re duce 

pairs adjudged heterogeneous by tlh Fisher (1935), IS 

A Simpie modificatron of the LSD test, P P ^ ^ ^ ^ reducing 

the size of each component test from « to <t j ( 2 J- ,, 

i i-i^tprncTpneous (when all group 


the size of each component test from « to «/ ( 2 j - 11US ' 1 “ 5 ,, 

the expected nnmber of pairs erroneously adjudged heterogeneous (when g P 

it must be remembered that 

ft ftxnected error rates iust referred to are uncon 


Step-by-step and simultaneous test procedures 

35.53 Lite the outlier problem of 32.23-5, which it resembles, the multin 

comparisons problem was often dtscussed in terms of sample range criteria P 
The simplest of these is Tukey’s (1951 19W ctll ,w- , S criteria. 

group means, which we shall now write x it \ = 1 2 k & The 

of homogeneity) a random sample of size k from l ar . e ^ 0n t ^ le hypothes 

ff 2 / f’ inde Pendently estimated by s 2 /N, where N dlSt u nbution wi * variant 

-hgroup as before. Suppose the Umftohel ° f ° bs ^vations i 
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from X(k- 1 ) are teste d against X( /c _ 2 ), and so on until no further “ heterogeneous ” 
verdict can be reached. Such procedures, in which the decision on each subset depends 
on previous decisions concerning a larger subset, are called step-by-step or stepwise 
procedures. 

35.54 A different simultaneous test procedure employing a sum-of-squares, rather 
than a range, technique is suggested by Gabriel (1964). The “ between-groups ” 
SS (35.21) is calculated for every one of the (2 7c — k — 1) subsets of two or more of the 
k groups, and tested against the fixed critical value obtained from the variance-ratio 
distribution with (k—1, n — k) d.fr. The sample sizes need not now be equal. 
This procedure leads to transitive judgements in the sense that no subset can be adjudged 
heterogeneous when a larger subset containing it is not; i.e. no subset can be adjudged 
homogeneous unless all smaller subsets which it contains are also homogeneous (cf. 
Exercise 35.16). Clearly, this procedure contains the ordinary variance-ratio test as 
a component, when the subset is actually the whole set of k groups. This implies 
that the test has overall size a. 

When all the n i are equal, Tukey’s test in 35.53 can be modified to be a “ simul¬ 
taneous test procedure ” of the same type as Gabriel’s if every subset of, instead of 
merely every subset of adjacent , group means is tested by the range criterion. (This 
test has the additional property that if any set of more than two groups is adjudged 
heterogeneous, at least one subset of it would be.) 

Both the Gabriel and the Tukey—Gabriel methods discussed in this section have 
the property that the probability of erroneously judging a subset to be heterogeneous 
decreases with the size of the subset. For k = 8 and 40 d.fr. for the Residual SS, 
this phenomenon is more marked for the former method—Gabriel (1964) gives tables’. 
The Tukey-Gabriel method is much simpler computationally, especially for large k 
but is only available when all n i are equal. 

9 


35 * 55 Instead of using the fixed critical value of the ^-observations studentized 
range, as in 35.53, we can test (x (j) -x {i) /(s/N^) against the studentized range of 
(J-z+1) observations, as suggested earlier by Newman (1939) and Keuls (1952) 

A new point now arises, for a set of q adjacent (ordered) group means may be declared 
homogeneous ” while a subset of p adjacent group means, contained within that set 
ls heterogeneous ” by this criterion. The Newman-Keuls step-by-step pro¬ 
cedure adjudges a pair of group means heterogeneous only if every subset of adjacent 
group means containing that pair is heterogeneous by the studentized range test iust 
enned, which takes account of the number in the subset. 

th/ rh v C0 , mpi f ati0nal P rocedure is J ust as in last paragraph of 35.53, except that 
va ue . ln the studentized range test now varies in the component tests 
Stead of being iked as previously. Once again, the overall size is at once seen to be 

’ n?S * (1) ) must first be ^judged heterogeneous if any other difference is to be. 
Newm * H Un . Can ( 1952 > 1955 > 1957 ) Proposes what is essentially a modification of the 
100n-! pr0Cedure in which each difference (x (j) -x {i) ) is tested against the 
v +i/ P er cent point of the studentized range of (j-i+ 1) observations, where 



¥ 

i' 



(j 1 -S l )-q.s/N^e i -e l <{.x ( -i l )+l.s/^ (35.105j 

trSS S ^tLtLT^ttZ^y equi-correlated multi- 
rmal x t . 

35.58 The method of 35.57 enables us to make simultaneous statements about 
lk(k-l) differences 0 { -Oj with a known overall confidence coefficient 1 —a. In 
iy applications of AV, we are interested not only in the differences but also in 
r linear combinations of the with constant coefficients summing to zero. Such 
ear combination is called a contrasty defined by 

r= .-?i Cf = 0 - (35.106) 

nost obviously useful contrast other than the difference between anv 0 and fl 
rence between the average of anv subset nf*nf+ui. U and 

Interact* sTdefin d „3518SuPT* 8 ** the 
v ln 35 - 41 ) are also at once seen 

r is easily adapted 


of 35.57; 


so that 
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ANALYS1 qince the number of 

* -—-,.-355 

co ntra sts 1S 

. s _0. and let X c« = 0. Consider the maximum possib e 

35.59 Write *, - % i=1 1 ,, as is also the 

r y c % Since £ c t = 0, the sum of the positive c t 2 i I *1 
value ot 2 * Ci* x - 

/ sum of tie negative «. We therefore see that 


S^ar^ClSI^I) max | **-** I* 


i.e 


I I) max 1 (*i—0,) — S i) I- (35.107) 

Referring back to (35.104), we see‘that (35.107) implies that for any choice of the 


with £ c i ~ 


Prob 




s/m 


= 1-oc ’ 


and hen “^_^£ | ^ ^Sc.^ + aSI c t 1 )?. S / W! (35.108) 

is simultaneously satisfied for all contrasts y = Sc A with probability 1 -«l Th^ 
method again generalizes to negatively equi-correlated multinormal x { (cf. 

35.11). 


35 60 In 35.59, simultaneous confidence intervals for all contrasts were obtained 
from intervals for all differences by the use of a rather wasteful inequality. It is not 
surmising therefore, that in general these are not the most useful intervals for all 
cornrasts To obtain a more useful set, we make an entirely different approach. 

The estimator of any contrast (35.106) is 

(35.109) 


Clearly, 

E(y>) = y, 

and, further, if the 9 ( are normally distributed, so will y be. If we now consider any 
set of r (^ k) estimated contrasts, which we write in the form ip - CO, it will be multi¬ 
normal ly distributed (cf. 15.4, Vol. 1) with mean vector equal to 4> = C6 and dis- 

persion matrix „ 

F V = V(4>) = CV(8)C . (35.110) 

In our present discussion of the one-way classification with equal frequencies IV, 
V(@) is diagonal, with elements a 2 /N, so that 


V = ~CC'. 

N 


(35.111) 


We assume that V is non-singular. 
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The r eSl 


Tt r ,5 10 no* imp ‘v S v-> (❖ ' t of V. Independently 

■* ° f 0 = r. tb* -fform with, say, o d.f. 


35.61 •<«- - V ' ' ; eedoI » equal-- this form 

■ n With *&«? 0 J is also distribute 

has a 3? « S S (divide •’I'') 


of Q > the ReS ‘ 
Thus the ratio 


i has the . 

, _ /Residual SS)A “^ giv es the static 
whC f 5 imolest case, (35.1H), . i,^_ v )/(« 2 /A' ) 

In the simplest = «, _ v y (CC) PP 

«) per cent point of this 
' VIA* 

available by 


(9 A) /( rV^)- distribution with (r, ») d.fr. . 

the variance 


t /- * P we now have 

^-distribution r ’ ( 35 . 112 ) 

If we call the 100(1-«) ^ _ y)/(s 2 /N) < rF a , T , *} instead of 

Prob{(tp - H*) (« » ^ ava ilable by usmg <35-1W 

The corresponding general resit t is 

I 35 - 111 )- . , C annot exceed ?, the num er 

35,1 Since Vmust* gg* - ^ 

A' '.Si*“ “*"* ,l “ ■* 

Prob{(fi-T-) 2 <?I r «,t,, P (V’)} = 1 ,, , .1 ./ 

Exercise isSZfln extremely simple 


Scheffe (1953, 1959) went on to show numerically that the intervals for all contrasts 
yielded by (35.113) are generally shorter than those obtained from (35.108) unless 
the contrasts happen to be differences—for which (35.108) reduces to (35.105), designed 
specifically for differences—or otherwise have very few non-zero q. Moreover, 
Scheffe’s method is not restricted by the need to have (x i — 0.) distributed with equal 
variances, an assumption fundamental to the argument of 35.57. 


35.63 If we now reconsider the variance-ratio test of the overall hypothesis t 
all the 8, are equal ((35.22) in the one-way classification case) we see that this hv 
thes,s (A (35.20)) states that q linearly independent contrast are M ^ u 
implies that all contrasts are aero, for every contrast mav he u ! aU Zer °' 1 
bination of the q linearly independent ones/ Thus the V b ^. regarded ? s a lir >ear c< 
to testing the hypothesis that each of the infinite HumT'A teSt !* lo g‘ ca >ly equiva 
i.e. seeing whether at least one of the infinite mimh^' t°- P ° SSlblcontra «s is z 
does not cover the value zero fSee ,i w“ number of intervals given bv H5 1 
extends at once to GahriePs S ° Exercises 35 -12 and 35 19 1 tr- ( J 


analysis of variance in the linear model 

. - g t fo e main use of Scheffe’s all-contrasts method: once the overall test has 

. ^ ^the homogeneity hypothesis, the all-contrasts method may be used to examine 
re J eCte ntrasts to r eveal whether they are in fact the reasons for rejecting the hypothesis, 
an 7 t° calculate confidence intervals for them—they need not be nominated in advance. 
an tural way of seeking the contrasts which are to “ blame ” for the rejection of overall 
u na 0 ffeneity is to start with all \k{k-l) differences. All this may be done without 
affecting the size of the overall test. If the reader will now refer back to the original 
/ discussion of the purposes of multiple comparisons in 35.51, he will probably agree 
that Scheffe’s all-contrasts method is very close to achieving those purposes. 

Gabriel (1966) gives a general theory of simultaneous test procedures. 

35.64 Dunn (1961) considers a procedure intermediate between setting confidence 
intervals for a single contrast and setting them for all contrasts. Her method requires 
the prior nomination of m contrasts as of special interest. The intervals obtained 
(based on “ Student’s ” £-statistic) are shorter than those obtained from either Tukey’s 
or Scheffe’s method if k (the number of parameters) exceeds 2 and m is not too large— 
this advantage increases as k , or the number of d.fr. for Residual, or the confidence 
coefficient 1—a, increases. The very simple result which underlies this method is 
given in Exercise 35.14. The procedure is improved by Siotani (1964). 

Ordered and metrical classifications 

35.65 Throughout this chapter, the classification variables have been quite general, 
no assumption having been made about whether, e.g., the groups in a classification 
are ordered in any way. However, if precise information is available concerning the 

» basis of the classification, the SS in the AV table can be further partitioned into corre¬ 
sponding components. For example, if it is known that the groups correspond to 
equally-spaced values of an underlying variable, the orthogonal polynomials discussed 
in 28.18-20, Vol. 2, may be used to assign a single d.fr. to the linear, quadratic, cubic, 
and higher-degree effects of the classifying variable, if necessary proceeding until 
all (A -1) d.fr. are exhausted. The method used is precisely that of Example 28.3. 

In more complex classifications, interactions as well as row- (or other) effects 
may be partitioned in this way if all the underlying variables are equally spaced. Com¬ 
putational methods are given by R. L. Anderson and Bancroft (1952) and in Fisher 
and Yates’ Tables. 

35.66 Bartholomew (1961) considers the case of an ordered classification as an 
alternative to the hypothesis of homogeneity of the groups. This is precisely the situa¬ 
tion discussed in 31.74. When the n { are all equal, the distribution-free test based on 
(31.151) is found to have higher power asymptotically than the LR test if the 0 i are 
equally spaced, but to be less powerful if, at the other extreme, all the 0 i are equal 
except one. See Exercise 35.15, and also the paper by Chacko (1963). 

Analysis of covariance 

35.67 A natural extension of AV arises when, in an analysis of classified data 
such as we have been discussing in this chapter, we have available to us not only the 





C 'rAT lSrlCS „ known or sus. 

-rtEO^ ° F fh6 r > 3b Sd outhet « 

nVAN cED 1 or m°r e Rifled. invest now * 

THE ***olo^otd^ wi^^d of *e measured 

igQ th e tfth e ^ > b ut^ b ^ipx oiae ^ a ^n^y sis * 

:rV^ of v °* ^oibly sUcb ,»ssion «****£ 


;-Jltt T jjjgo the val j/the d ata ^but ^ at !jgX o^ 6 ) aIld °naly sis - 

50 . on / but lueo^ r 1 on the * S, -hlv a c °^( r such an . n methods, so 

obs e ^7nfl ue ° ceth n Vanalysi f °sific ati ° n (P°u e p ufp° se ,° b y reg resSl T t r discounting 

pP° sS , e elh*^ fupon J ^'ion (before , 

• j;S »sfS»>- ffS 

sr/.«-s ir3«s ",r--s';S”-»»* "* 
ff ':' W<-ss“ .s.»-** ««£,'■ 

sS'T^* 1 -"'^-tss&s*" 

It will obviously m* ielding the classic . the regres . 

• ««« rase. • i_. additionally) be interested “ ask whether 


Juction w — . interpret 11 "* ....jfication, nut. 

It will obviously yielding the da _ nether the regres- 

^ ^ but 

^£t?*£j***r beween the ’ 

intrinsic interest 

e ,i ^as rome to be calk 


• iance , 
are 


Whichever method it|W* betwee „ the variables. 

intrinsic interest in ^ ^ Cmananc 

35 68 This branch of the subject as j j on - sums 0 f products of y and 
jecause the regression calcutoons mvdve^P ^ rf gg. The variables x a 

„ the same way as ordinary AV ■ 0 f pr i 0 r interest, 

isuaily called cmomtant vanables, implying t y P . contained 

An extended expository review of uses of the Analysis on.uv 

■ set of seven papers in the September 1957 issuei rf Brome (Vol. 13, No. 3). 
tear introductory account is given by D. R. Cox (1958). We shall be concerne 
»fv with its theoretical aspects. 


35.69 Since we have already seen that regression analysis and AY can each b 
treated within the framework of a linear model, it is evident that Analysis of Covariance 
a mixture of the two . can be so tieated. The lntfirnretutivp frmvnnmnnn /-.f 


treated within the tramework or a linear model, it is evident that Analysis of Covariance 
a mixture of the two, can be so treated. The interpretative convenience of havin 
the concomitant variables unaffected by the treatments is now seen to be a soeci; 
case of the convenience of having different sets of regressors unonrreWH r 

rSys^^r set up a ^ ^ “S “ S 

^ if the AV which is (so to speak 
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to T 

of a linear model to include further parameters 
Tb 35 7 D 0 Suppose that a linear model ^(singular or non-singular) (35 . U 4) 

„ „ . e . ■* —' - * " b ”“ 4 W 

Sheds of Chapter 19 is (35.115) 

where of course T = (X'X)-X' in the non-singular case. We find that the Residual 

SS is (y -X§)’(y-X9) = «I-XT)y}' {(I-XT)y} u6) 

= y'(I—XT)y, v 

. TX - I is the condition for unbiassedness of S. The matrix (1-XT) is idem- 
since xa - 
potent. 


*K 71 Now consider the extended model 

y = X0 + Z(J+e, 


(35.117) 


or 


y-Z(3 = X0 + e. 

' if ft were known, this would have the solution, from (35.114-15), 

A P 8 = T(y-ZP), (35.U8J 

and thus the LS solution of (35.117) for 9 and p may be obtained by solving for P 

a!0ne y = X.T(y —Zp) + Zp + e 

t 

or (I—XT)y = (I-XT)Zp+e. (35.119) 

(35.119) is a linear model, and we assume that it is non-singular. We therefore have, 

from Chapter 19, £ = {Z'(I-XT)Z}- l Z'(I-XT)y, (35.120) 

V(P) = c; 2 (Z'(I-XT)Z}- 1 . (35.121) 


35.72 The reduction in the Residual SS due to the extension of the model is 
{(I - XT)Z(3}' {(I - XT)Zp} = $ Z' (I - XT)y. . (35.122) 

On comparing this with (35.116), we see that they embody the same matrix (I-XT); 
(35.122) differs only in that ^Z' replaces y' in premultiplying this matrix. This simplifies 
the computation of (35.122), since we have to replace a quadratic form in y by a set 
of corresponding bilinear forms, obtained by each column o in turn rep acing y 
in (35.116). These bilinear forms, assembled into a column vector, are premultiplied 

by ^ to obtain (35.122). 
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exercises 

■r n «•• = w the SS in (35.63-4) become 

35.1 Verify that if a ^ = ’(S S SytfflVfro"). 

4 % j P 

s = 2 (SS^VM-^i 

l j P 

S, = S(SSwp)VM-^ 

i t 2 > 

53 = 211(2 yupY/m - (S x + ^2 + ^ 4 ), 

i j P 

Sr = 2 2 2 3^f/p — (5i + tS , 2 + *S ' 3 + £ 4 ), 

* i P 

and show that if in = 1, these SS reduce (the summation over p now being redundant) to 

S t = (2 2 yaYKrc), 

i 3 

Si = 2 (2w)Vc-^» 

l j 

S, =S(S W )‘/r-S 1 , 

3 i 

S3 = XU yfj — (S x + S 2 + iS 4 ), 

*Sy? = 0. 

35.2 Verify that if all mj = m, the matrix C defined at (35.39) becomes 

'2E E E . . v\ 

E 2E E . . ; p\ 




( r ~-l)(c~l)x(r-l)(c-i) 



= 



•E • 2E 


/ 








where 


Show that 
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C" 1 = — 



(r-l)E- 1 

-E-* 


-E- 1 


-E-t . . . -E-i 


* . -E- 1 
-E-i (r —1)E -1 , 


where 



and hence verify that the LS estimator of 0 t j is given by (35.50). 



35.3 Generalize the AV table of Example 35.5 to a three-way hierarchical classification 
and give the three LR tests of the hypothesis that there is no variation in (a) the group means; 
(b) sub-group means within groups; and (c) sub-sub-group means within sub-groups. 

35.4 In Example 35.4, show that if it is postulated that H z of (35.54) holds, so that there 
are known to be no interactions, the SS attributable to row-effects is M — S 2 , and the SS attribut¬ 
able to column-effects is M — S lt where and S 2 are defined by (35.59-60). Show that the 
Residual MS has (n-r-c + 1) d.fr. in this case. 


35.5 Show that if, in a (r x 2) cross-classification, the differences of cell-means d% = yh.—yn. 
are analysed by the method of weighted squares of means applied to the yi in 35.31, the SS 
2 Wiidi- 2 Widi/Yi Wi) 2 , where Wi = ttf + rtf, provides the test of the hypothesis that the 
i i * 

interactions are all zero. /y t 1934) 


, c , T r-> a 9 v 9 v 2 x = 2 m cross-classification, show that if a 2 x 2™" 1 table is formed 

35,6 * c +u i e 0 ;fWinn<? (A sav) against all possible combinations of the others, the 

for any one of * e j“ff“enc«’of^he mw-cell means provides an efficient estimator and 

r e of g the eSe e c a t of i st” this is a generation of 35.3!. and that any interacts 
can similarly be tested. 


/\7„a-_1 Q'lAA 


35.7 The table below gives Brandt’s ^•^“^.^U^frequin^’tiij and the cell totals 

classification by breed and sex o£ 533 f S f“ g t a g e bacon yielded by the carcass) are shown 

of the variable studied (the logarithm of the percentage 


overleaf. 




Exercise 


’ D< 


S>' W 


Varia 1 ' 00 


Due to breeds 
DueWjsefl 

Breed 5 an^s eXeS 

Interactions^ 

Total between classes 
Residual 


1-2715 

11-7427 

13-0142 


0-0329 

0-0848 

0-0227 


Re S idual^-— - 532 , not interact 

bacon yield, but that they 
1 . uoon Dacuu j 


/ 


. , bBe a a »d sex «=h have effect »P»" “““ ’. tha t the AV table 

5 h 0 W that breed in Example 35.0, inte ractions a-e 

35.8 Ifd,ere d is ^ y *”foS , SS = 0, aid‘ becomes the Residual 

35 95) holds good with th w ith (r-1) ( c U { J 

<5- aj other' parameteK^cf. Example 35.3). 

;s for testing ^ ^ observations i n every cell 

35.9 Show that the AV for a (rxc) ™r^. ckssiScati() n with exactly one observation 
toSrCr diSonfactor is “ replication,” its main effect and all interactions 

incemed with it being defined to be identically zero. 

35.10 For the Newman-Keuls procedure defined in 35.54, show that if the true means 
any subset of p of the k groups are all equal, while all the other groups have different means, 

: probability of wrongly adjudging a pair in the “ cluster ” to be heterogeneous cannot exceed a. 

If there are m. such clusters of truly equal means, show that the probability of wrongly 
udging a pair from any such cluster to be heterogeneous cannot exceed met, 

(Hartley, 1955) 

V- diStributed with 311 variances ) 

pmdenfly normal if *„ isl norma | variabWndepaident o'f the - V1,r ^ ,eS « = are 

/ • By applying the method of 35.57 to tbo ~ u Xl Wlt}l zero mean i 

r,« . on__ the zt > sh °w that, With nmhnha;.. 

0 :i < 



l0 1 


2ero m ean and variance 
” Sh0w that > W «H probability 1 - a 

f/4 , ._* ’ 















• ultaneously for all P(ft -1) differences Qi-Oj. Show that the result of 35.59 generalizes 
^exactly the same way. (Scheffe, 1959) 
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12 Jn the one-way classification of Example 35.1, without loss of generality, take 

* ^ 6 { as origin and a as unit. Show that the value of t* = (S « »*)*/(? ^/m) is maximized 

A ‘ hoice of the « when « oc nA, so that S « = 0 and | * | is the largest observed absolute 

• of a contrast to its standard error. Show further that t 2 = S^, the numerator SS (d 
ra 735 21 )) of ^e overall variance-ratio test defined by (35.22), so that the overall test essentially 

tests the largest observed contrast. (This result holds quite generally— 

cf. Scheffe (1959) and Gabriel (1964)) 


35 !3 By defining a dummy parameter 0 O = 0 with estimator 0 = 0 also 
^ combinations V = S dOt (where S « need not be zero) may have confidence intervals 

* * - < . . i _ j _r /oc i nQA t-nav similarly 


linear comoinauvn* y ^ -r —- ' 

t for them by (35.113) with q increased by 1 ; and that the method of (35.108) may similar y 
be used if k is increased by 1 and £ E | ct | is replaced by 


max {E a, ct > 0 ; | E ci |, Ci<0} 


(Tukey, 1953; Scheffe, 1959) 


35 14 In 35.64, consider k non-independent events with equal probabilities P x of °^ cur ”^- 
Show that Pic, the probability that they all occur, satisfies Pic>l-k(\-P 1 ), and let P x b 
probability 1 -a that a “ Student’s” ^-statistic (v d.fr.) hes in the interval ( t A , *«). Sh 
” * . ~ _i_7 — v 


iha« if m linear combinations X, = jttl, » = 1 , 2 .-m. are estimated by i, = S otft, 

then is distributed in" Student's ” form with v d.fr. for each s. Hence show 

,ha * Prob{l.-a^a.)]‘«il-<'« + ^( , *H*>>l-«- m ..... 

(Dunn, 1961) 


35 15 Consider the distribution-free statistic U for testing k samples against ordered alterna¬ 
tives defined at (31.151), and the competitive statistic U' defined similarly, except that U pq 

is replaced by 


np Vq V \ 

UL = E E (Xpi-Xqj) m npVq{Xp-X q ) t 


J V<1 


i= 1 3 = 1 

so that k j. 

U' = 2 2 n P nq(xp-x Q ). 

p =l 

For normally distributed observations m. with means £(*<•) = »< and equal variances < show 
that if n - n - = nt = N, the asymptotic power functions of U and U infesting equality 

of the fl, against the alternative hypothesis that the 0, are equally spaced, are respectively 

G{A(3/^-U and G{A-A a }, 

where G is the standardized normal d.f., A 2 = S <#,-«)•/•«. and G{-U = « as in Chapter 2S. 

9 . = 1 - - 


Deduce that the ARE of U compared to V il"3/m Show that this last result holds whatever 
the relations among the ordered 0*. (Bartholomew, 1961) 
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fudging aay ^ ss (35 . 21) , but applied 

tep-by-steP P^f U ; C e b same overall size « ^"luHsed 

35.5 >- *« ffaragPph o^.». fbystep a subset can 

i"* ero T r °„ f f 35 5t Show that for s “ c e h s a ubse ? being tested, and^ ^^ous method. 

Z&S&&-*--**** ( 


, 35 62 3 show by the Cauchy inequality that 
35.19 In 35.62-i, snow y * . a 

max's £ euDi , =1 i= i . , • 

3 „d hence that the sq-ed "* *at the 0. are 

is distributed as (S cM s *> wnere r 

equal. Hence estlblish (35.1W is due t0 M- H . Belz and A. M. W. Verhagen 
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’ CHAPTER 36 

other models for the analysis of variance 

OJtlK'*' wit k t ^ e applica- 

tion of t. he g t ne whole of the discussion was the assumption, through their mean 
Underlying t e the classification affected the ° s ^ & dl therefore, Analysis 

the linear mode , the case of the general linear mo , arr i e d out 

^’•teTiv 0 )"ate.y described as an analysis 

of Variance : ( ^ o£ squares (SS) compute d from the ° ler cases , e ven 

through « b ie f t tliat we are led to very simi ar (an Q f situation. 

It,S n computations of SS when invest,gating a quite different to obsc ure 

’ de "“ Ca ^rly development of the subject this sum! anty mode l s , which 

£■ “by StV“). er ^ g w 

r-Srrr-tss"—- «• 

= r„5.“f5t 4- *. Model . 

“ Analysis of Variance” must "owbe^ broadened We defurn ^ ^ gg 

^Up^enttsTttnLUirto various factors, acting singly and in comb,natron. 

M ^i=:ft“ar model (19.8), consider the superficially similar 
model v = 10 + Xu + e . (36.1) 

(/ix 1) (nxl) (»xp)(px 1) (HXl) 

In (36 1) as in Chapter 35 (and earlier in Exercise 19.1, Vol. 2), we isolate the “ g en «^ 
mean ” 8, which here will need no subscript. As before, 1 is a vector of units and X 
a known matrix of constants, while e is the vector of errors in the observations. The 
crucial change is the replacement of the parameter-vector 0 in (19.8) by a vector u 
of p random variables. Thus (36.1) states that y t (i = 1, 2, ... , it) is composed of 
the general mean 6, plus a linear combination of p random variables u p plus an error 
term, e P There are (p+ 1) random components of y i} instead of only the one in 

(19.8). ' 

We assume, as at (19.9—10) for the general linear model, that 

£(e) = 0; V(e) = E(ee') = o;l, (36.2) 

where we have added an identifying subscript to the error variance a; to distinguish 
it from the variances of our new random variables. We further assume that 

E( u) = 0, (36.3) 
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doa. > ableS del ar SnalW °‘, Suppose, 

Th“ s all ° ur r est i« * e the ^ t0 be «» } ; n the form 

;^rssS'»-—«,%«• . ”" J ” 

ar e now vectois ( variance <Sy 

where the u i \ ith ze ro mean matrix of 

. J(rfl 

of X ,« 51 we write W for *0 Jf(36.2-4). (36.6) 


efore we wish to estimate *^TthTobservations, ^2 e 

•SffifSi- m0del f0 r , 

of . h ( U t 16 51 is often called a «W9> general linear / 

to'thV** Model II” label e tia i distinction ^f/y^fe noTdiagonal), since they 

(36.6) emphasizes an ssent ^ uncorre i ate d (V » not 

^'l^Strfnnctions of the same vanables u. 


CJVP 


a''.««“££S”p^-<» 

"iSi kt g».|. -a th. «1> -k”™""" 

in the ith group is 

^ = 0 + u i + 8 w 

where the (the^> components of u x ) and the B iq all have zero means, are all uncorre¬ 
iated, and have variances 

var u { = of, all i, 
var e iq = of, all i, q. 

It follows immediately that 

va % = of + of, l 

o^fcoSnT 0 STT t-f e m ° del arC literally com P 0 ne«s of the variance 

which we have attached to the model (36.5) in general mP ° nentS ° f VananCe 








OTHER MODELS FOR THE ANALYSIS OF VARIANCE 


36.4 Our investigation of the properties of the model (36.5) must start from 
the beginning; none of the LS theory used in the general linear model is now applicable. 
Our treatment follows that of Graybill and Hultquist (1961). 

(36.5) can be written more symmetrically as 

k +1 

Y = 2 X,u,, (36.7) 

i=o 

where we define 

X o - 1 > u o = 0 (scalar), X k+1 = I, u k+1 = e. 

There are (k + 2) parameters in (36.7), namely 0 and the (fc + 1) variances appearing 
in the model. The assumptions (36.2-4) are now summarized as 

E( u ,) = 0, V(u 3 .) = E{UfVi) = of I,\ (36 8) 

^ = 0, i^j- i,j= 1,2, ...,fc + l,J ^ 

where cr| + i = Formally, we may write 

<*o = F(u 0 Uq) = 0 2 , (36.9) 

so that our parameters are of (j = 0, 1, 2, ... , k+1). We can now rewrite (36.6) 


where A j — X^-Xj, and 


W = E(yy') = 2 of A,, 

j=0 


fc-hl 

V, = 2 of A,, 

J = 1 


(36.10) 


(36.11) 


Unbiassed quadratic estimation of the parameters 

36.5 In our investigation of the general linear model, we found at (19.19) the 
condition which must be satisfied if linear functions of the parameters are to be un- 
biassedly estimated by linear functions of the observations. In the components of 
variance model (36.7), the parameters occur in (36.10) as coefficients in W, the matrix 
of second-order moments of the observations, and it is natural to seek quadratic esti¬ 
mators of them. We now prove that a necessary and sufficient condition that the a| 
be unbiassedly estimable by quadratic forms y' C s y is that the matrices A s are linearly 
independent. 


36.6 First, assume that there exist matrices C & . such that 

E(y'C s y) = af. 

Using (36.7-8), this implies that 

a\ = EpX^.)' C/SXjUj)} = E^uJX^XjU,}. 
j J 

As at (19.38), 

E{ uJ-BUj.) = of tr B, j = 1,2,..., k+1, 
and this also holds trivially for j = 0, so (36.13) becomes 

of = S 1 aJtrffiC i ^)-Sflftr(X l ^P l ) 

j=o j 

= Softr (A 3 -C s ). 


(36.12) 

(36.13) 


(36.14) 


(36.15) 
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Sufficient statistics in the normal commutative case 

36.8 So far, no assumptions have been made concerning the distributional forms 
of the random variables u j in our model. We now investigate the case where each 
uj(j = 1,2,... ,k+l) is multinormally distributed. Together with the zero means 
and covariances assumed in (36.8), this implies that the p random variables U-, j = 1 
2 ,... ,p, are independent normal variables with zero means. V 

The normality assumption alone will not take us very much further- i 

substantial p r „ grfsS) w^ust mapose the additional condition of community’ 1 

The A, are symmetric, 'so* always have °’ ’’ 2 ’ ''' ’ H1 (36.21) 


K ‘ k ‘ = W = (A,-A,)'. 
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Thus what (36.21) requires is that each A, A,, as wellaseach A„ be symmetne. 

^condition is restrictive is shown in Example 36.2. 

^Outsider again the one-way classification situation in Example 36.1, with It 
Here A„ = 11' as always, and from Example 35.1, 

/(H'k 0 

Al = (“V. 

\« 

. the suffixes give the number of rows and columns in the submatrices. Mu tt 
W ,Varion showTthft A.A, in its first % columns has every element equal to ^ 

P 1 columns has every element equal to n a , and so on until in its las n p , 

every^element is equal J v Thus A.A, is symmetric (A, and At commute) onty 
f !lUhe n- are equal. Since X 2 = I, A 2 = I also and always commutes. The prese 
1 M will therefore cover the one-way classification only in the balanced cas , 

Tutt fluencies in the f groups. Contrast Example 35.1 for Model X, where group 
frequencies were quite unimportant. 

U 9 As Example 36.2 indicates, we can only expect (36.21) to hold in general 
for the balanced case. We now proceed with our investigation, bearing this restriction 

m The^ normality and independence of the p random variables u p j = 1, 2, . .. ,p, 
implies that the (correlated) variables y u y^..^y n are multinormally distributed 

Wlth E(y) = X 0 u 0 = 10 

an d dispersion matrix V, given by (36.11). The quadratic form in the exponent of 
their multinormal distribution is therefore 

Q = (y-i0)'V(y~ 10 )’ ( 36 - 22) 

distributed in the chi-squared form with n d.fr. by 15.10. 

3610 Now, because of the commutativity condition (36.21), there exists an 
orthogonal matrix P which simultaneously diagonalizes all the A, so that 

PA 3 .P' = D 3 - ( 36 - 23 ) 

where D is a diagonal matrix. Moreover, we may choose P so that one row (say, its 
first, denoted by PJ has elements all equal to »"* and 

p x l = P,1 = 0, j# 1, ( 36 - 24 ) 

where P- is anv set of rows of P not including Pi- . ,. , 

(36.10-11) show that W and \ y are also diagonalized by _P, the leadmg 

of PWP' and PV W P' respectively containing the latent roots of W and those of V It 

r 11 , f (*£ 24) that these two sets of latent roots (which are all positive) 

follows at once from (5b. 14) tnat tnese iwu h 0 f y, are 

coincide except for the first: if the roots of W are ] = 1,2, ... , n, those v y 




62 


•> 

6j- 
, * 1 , p 

If s ,s th . ,* 



„ of sT A ' rlSTlC a . functions of the 

,o 1**°** ,re of C ° Ut 

0** (toots ,; stin ct roots of V,, 

A . -these la 1 ber of * e other roo t; 

1 - * the ^ ^ A,, A* coincide 


d 5* the nU -ncid e wit £ !,f Ai, ** coincid e 
ofW , does • th er of ?° dthat 


m rtietc^s ^ uer °* d do eS * n t 5 ^ f hat 

-iat afll the m> mb 1 wh efl / _ oVerS ed; °* 4l ^ s hoW * ha 

%n)> 


Tf j is tut' •; __ S „1 wh er \* re ver sed ’^g( 3 l) sh°' v 
may have ' f this si tuat !f n nd tf^ 18 . <r .. 

,tfitha*° ther s . 


with a 110 ’ 


t dition. 


(36.25) 
before, (M) 


subject to a f j, identic®'J^- plO). l, e fore, tj U ' *) 

pP' =* piflV (PV y P) ' . c: rs t row aS . e( f Thus 

i6tl Since P* n-(Py"^ , V V is the f is ign° rea ‘ 

36 . e u p ...,r.. where f Ltootij^ enAl 

«, oartition P i nt0 . l ’d multiplicity 0 , cotl ies 
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where If is * he that the t statistic 3 

We S ee at on ^ y 'P)P,y, J = 2 > 3 ’ ‘ ^ n0 other statistic enters 
c . . - r the (k+2) parameters of the model, ^ sufficient. 

£ Sod Function. It £ ”^^ 33^8 if « is not equal to any other 
The proof Of this follows directly by Graybill and Hultquist (1961), who 

of the X } \ if it is s0 e ^ ual > x f PJJjjL . - m 35^5 below to show that if t - s — k + 2 

36.10) the set of minimal sufficient statistics is complete. 


h/h 



(36.26) 


5 


36.12 (36.26) shows directly that the statistic P x y has a marginal univariate normal 
distribution with mean n?6 and variance 2* = PpVy Pf. Further, each of the Other 
(t-1) components of the minimal sufficient set is a quadratic form 

ft* y'mv, 7 = 2 , 3 , ...,*, 

in multinormal variables. Each matrix Xj l P] P.V is idempotent, since P V P- = l I. 
Furthermore, since Pj V y P/ = 0, j+ /, the sum of the 0 1 v J j j 

os J J 


also has the property that its matrix 


Q*=Xj m . y 


2 Vp;p,v 

. j =2 J 3 v 

is idempotent. Thus Q* = 


The idempotendes j ^ 

nrA Air, j. *i . - J 


are distrib 


A, 


P y he result that the O i 

ln the y} distrihnf* 2 

l distribution, which 
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because the non-central parameter is (10)'Ay^P^lG) = 0 by (36.24)., Since the 
result above for P x y is equivalent to the quadratic form O x = y'Af 1 PfPiY having 
a non-central x 2 distribution with 1 d.fr. and non-central parameter nO 2 , we see finally 
that the t quadratic forms Q i have ranks adding to «, the rank of their sum Q at (36.26). 

Each is therefore independent of each of the others by 35.7. 

Analysis of variance in Model II 

36.13 We now observe an important distinction between Model I and Model II: 
in the latter, the SS y' y cannot be decomposed into quadratic forms which are them¬ 
selves independently distributed in the (central or non-central) chi-squared form, for 
y' y itself is not so distributed, on account of the covariances between the observations— 
it is (36.22) which has that property instead. However, 36.11-12 show that the t quad¬ 
ratic forms Qj are so distributed. The matrices of these forms are 

PiPr/A*!, PjPA', 7 = 2,3, 

t 

and since 21 Pj P^ = P'P = I, we see that we may decompose y'y into t quadratic 

j=i 

forms which, when divided by the corresponding distinct latent roots of V y , are inde¬ 
pendent chi-squared variables. Moreover, the degrees of freedom are simply the 
multiplicities of the roots, and all the forms except Q x have central distributions. 

36.14 Since 

E(Qi) = n p j> 2, 

we have at once 

Eiy'Y^y/n,) = j>2. . (36.27) 

The latent roots may therefore be found as the expectations of the corresponding 
Mean Squares (MS) in the AV table. Since 

E( Piy) = nH, 

a 

0 is estimated by the mean of all the observations y, as is obvious. We require to 
estimate the other (& +1) parameters, the variances. If the Xj are (k + \) different 
functions of these parameters, (36.27) may be solved to give estimators of the parameters. 

In the common case when the Xj are all linear functions of the parameters, (36.27) 
is particularly easy to solve to give unbiassed estimators of the erf. 

Graybill and Hultquist (1961) show that an AY in our extended sense exists with 
the latent roots all different functions of the parameters if and only if the commutativity 
condition (36.21) holds and W = E( yy') has s = k + 2 (the minimum possible number 
—see 36.10) distinct latent roots. Under the multinormality assumption, the set of 
sufficient statistics is then complete (cf. 36.11) and the estimators are unique MV 
unbiassed for their expectations, by 17.35. Graybill and Hultquist (1961) show that 
the MS in (36.27) remain MV unbiassed quadratic estimators of their expectations 
under weaker conditions than multinormality. 

36.15 We are now in a position to connect our study of Model II with the Model I 
AV of classifications investigated in Chapter 35. Any Model I AV table is a decom¬ 
position of the SS y'y into component SS (of which one is for the general mean). 
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has the chi-squared form with {p- 1) d.fr., since it is a standardized sum of squares 
about the sample mean. The expected value of S^, the between-groups SS, is therefore 
(p-lK {a\+<rjn t ) and that of the MS is E{SJ(p- 1)} = crJ + j^oS = 2 a , say. 

We thus have the two independently distributed chi-sr. 


v-t • -hat ot the Mb is &{S t /(p- 1)} = G^n^i = A 

We thus have the two independently distributed chi-square variables 
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this is an optimum test in Model II, as we know it to be in Model I from general 11 
, nothesis theory. Second, although the test statistic is the same for both mode s, 
‘ts distribution is the same only under the hypothesis of no difference between groups. 
Its power function must necessarily be different in the two models, since the alternatives 
are quite different in the two cases (cf. Exercise 36.1). 

It will be seen by solving the expressions for E(S 2 ) and E(S 3 ) that 

This unbiassed estimator of o\ can obviously be negative; it remains the MY unbiassed 
estimator since it is a function of the complete sufficient statistics ( y , S 2 , S 3 ) (cf. 36.11). 


Example 36.4 Balanced two-way cross-classification 

For the balanced two-way cross-classification, the Model I AV table ((35.65) and 
Exercise 35.1) will hold under Model II by 36.15, apart from its last column, which is 
specific to Model I. In the model (36.7), we now have k = 3. For convenience, 
we denote the three variances to be estimated by o R , o 2 c and o RC , indicating that they are 
respectively the variances of the Row, the Column and the Interaction (Row X Column) 
random variables u. It is easy to see, as in Example 36.2, that the commutativity 
condition (36.21) holds; indeed, considerations of symmetry make this obvious. 

Leaving aside the general mean, the four MS must have their expectations evaluated. 

As in Example 36.3, examination of the model, now written in an obvious notation 

y ijv ^ + u.^ -j- u,j + u^j + Sjjp, (36.28) 

^ (^ 1, 2, ... , r, j = 1, 2, ... , c\ p —— 1, 2, ... , wi), 

shows that the Residual SS, now denoted by S 5 = S S S (y^-y^.) 2 , is identical 

i * v 

with SSS^-gy.) 2 and has the same distribution as in Model I, so that its MS 
* j v 

has expectation o;, and this will evidently be generally true for the balanced Model II. 
The Rows SS is now written 

S 2 = cml,(y i '-y„) 2 = cm 2 {(«„ + u im + ej-( m.* + «.. + e..)} 2 , 

and as in Example 36.3, 

var (tt t -, + tt*. + eJ = o\+o\ c /c + o\/{cm). 

Thus the Rows MS has expectation 

E i s A r ~ 1 )} = + m °lc + cmo \, ( 36 . 29 ) 

and by interchanging row and column symbols, we have similarly, for the Columns 

E{S 3 /{c- 1)} = ol+molc+rmol . (36.30) 

The Interactions SS is written 

St = mHI l (y ij -y it -y tj ^y J* 
i j 

= m 2 s {(“«+%) - (%+«(..)-(«./+ e .s)+ («..+O} 2 - 


(36.31) 
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35.65) 


Rows: 
Columns: 

Interactions: 

Residual: 


E(MS) o , cnion 

, , < rJ + «A+"' Wc ' 

/l 3 E „ 

i = ol + VWRO 

k 


( 36 . 33 ) 


Interactions • * 2 ^ 
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e find, for the first time, that we have to distinguish different choices of the divisor 
3 for the E-tests in an AV table. 
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Testing hypotheses in Model II AV 

36.16 The remarks in Examples 36.3-4 draw our attention to the fact that we 
have given no theoret,cal justification so far for the use of E’-tests in Model TT AV 
Even where the test coincides with that of Modnl T it-c ^>1 . . . ^ AV. 

and we cannot presume optimality in anv mse ’ i laracteristlcs will be different 
may be required. We must now Consider the thTo ^f nCW teSt statisti “ 

of the obi Jo„; ifS SqMreS ' " "* ”ite the 
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j S j oW we observe that X\, defined as X 1 — nd 2 , is not a function of 0 at all, for by 36.12 
it is equal to Pi V^Pi, i.e. w 1 times the sum of all the elements of V v> which are variances 
and covariances, and therefore free of 0. In every case we shall consider, t = k + 2, 
the number of parameters to be estimated. Thus we have (k + 2) latent roots which 
are (in our case, always linear) functions of (k + \) parameters of. We can eliminate 
this redundancy by expressing X\ in terms of the other latent roots, so that (36.34) 

j becomes 

dG oc exp 

Gautschi (1959) has shown the family of distributions 

/(f,r) cc exp tjtj + t\g( t 2 , ... , r r )| (36.36) 

to be complete, a result not covered by (23.19), Vol. 2. We see at once that (36.35) 
is a case of (36.36) with 

tj — Sj, Tj = 1/(2 Xj), j>2\ 

h = y> Ti = nd/p(X 2} ... ,X t )\ 

I and 

g( r 2, • • • > ^r) = 2^/P(^2> • ■ • » 2(). 

Thus (36.35) is complete with this parametrization. 


ny 2 — 2 nyO + fc + 2 
p(X 2 , ..., Xj) j=2 


J = 2 XjJJ 


(36.35) 


36.17 We may now make use of the results of Chapter 23. We are debarred 
^ from finding UMPU tests by the methods of 23.27-36, since (36.36) is more general 
than (23.73). However, we may find UMP similar tests directly by the method of 
23.20. 

In our applications to balanced Model II AV, the latent roots are linear forms in 
the parameters as exemplified in (36.33). The hypothesis which we wish to test (that 
one particular of, and that one alone, is zero) is always equivalent to testing that two 
particular latent roots are equal, as can be verified at (36.33) or in the simpler case of 
Example 36.3. 

We first observe that H 0 : X q = X p \ q, p > 1 leaves us with a set of (k + \) complete 
sufficient statistics 




T — {y, S 2 , • • • > ^p- 1) 1) • • • > ^n 1> • • • ) + 2» (^p "b ^qi\ • 

Now we assume that p(X) in (36.35) is a function of X p or X, r but not of both. Following 
23.20, we see that every similar region for 7/ 0 consists of a fraction a of each contour 
of constant T t which we now hold fixed. We write 


i(SpS q \ _ (S p +S S p (X q -X p )\ 

*W + h) ■ 'TC ‘W T 


(36.37) 


For fixed T, use of the Neyman-Pearson lemma (22.6) on (36.35) with (36.37) inserted 
shows that the uniformly most powerful size-a critical region for testing H 0 against 
is given by 


-hSp^-^c^n 
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;. s is identical 


Residual: ”> .^thesis that a, -• * Example 
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ies not affect the test (see ux 


T&ample36.4,(36.33)shows at »ce *at ^ ^ e h q “ pothese s indicated 

» "or 6 '=0a, -U it is tempting to 
“ we iTVt “V-Tte'i^by testing So (or S,) against the pooled SS 
^50 EvISly the increase in the dir. of the denominator of the variance ratro 
‘ br ing an increase in the power of the test, but since the decision whether to pool 
, with S 5 depends on the previous test, it may be wrongly taken when X x # A- 0 . As 
result, control of the size of the overall test procedure becomes difficult. The numeric- 
[y complicated theory, and recommendations for such pooling procedures, are treated 
r Bozivich et al. (1956) and Srivastava and Bozivich (1962). 

Because the Interactions MS is the denominator for the tests of row-effects and 
column-effects, we can make these tests, even when every cell frequency m = 1. 

“ s was not so in Model 1 ( cf - Example 35.3) unless we were able to say that all inter- 

aons were zero. Here, only the test of a},,. = 0 is lost when all / 1 j ?i 

sidual SS is identically zero. 1 ”« = 1 and the 

ineral balanced cross-classifications 

<36 ’ 33) for the tw °- way — 
r balanced cross-classifications. For the 
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cross-classification, it will similarly be found that the MS have expectation 

three-way obviously extended notation, by: 
in 


given, 


Rows ( R) : 

Columns (C) 

Layers ( L ): 


E( MS) 

X % = a\ + mo\cL + lmo%c + cmG% L + clmo$\ 

A 3 = a® + mo\ CL + Imonc + ™i°cl + rlma%^ 

X 4 = a\ 4- ma% C L + cma\ L + rma% L + rcma\ 

r(i?xC): A 5 = <J 9 ; + ma\ C L + l ma RC 

V ct order Interactions-! (RxL): A 6 = o \+ ma\ CL + cma\ L 
* l(CxL): A 7 = o\ + ma\ CL + rma% L 

Second-order Interactions 

(RxCxL): K = Os + Wrcl 

Residual: 


L = 


(36.39) 


This corresponds to the model, generalizing (36.28), 


(36.40) 


y ijkp = d + Ui** + + u +*h + u ij* + U i*k + U *ilc + U ijk + £ Wp 

evoked MS has of plus m times a linear function of the variances. This linear function 
mntains every variance in the model which includes among its subscripts all the 'dentlfy 
• Wers of the MS The coefficient of each such variance is the product of the upper 
U^s of the sufces in (36.40) corresponding to letters not included among the sub- 
prints of the variance; if all letters are included, the coefficient is unity. Thus, e.g., 
considering the expectation for the MS for the (R x L) Interactions, the only variances 
containing^*)* R and L among their subscripts are a RL and a RCl . a RL omits only 
C from its subscripts, and the corresponding suffix in (36TO) is j, with upper taut c 
^ includes all subscripts, and gets the coefficient unity. Thus we obtain (c<r R1 + o SCI ), 
to be multiplied by m and added to <rf, as given in (36.39). More general rules o 
formation of expectations of MS, including this balanced Model II nje * a special 
case, are given by Cornfield and Tukey (1956) (see also Scheffe (1959)). 


36.19 (36.39) reveals a new feature of the three-way cross-classification, which 

persists for all higher-order cases. In Examples 36.5-6, we saw that each hypothesis 
of interest in the one-way and two-way cases (that some variance ^ zer0 ) ™ as 
with the hypothesis of equality of two expected MS In (36.39) it will be see 
this remains true so far as 4cx, 4c, oh aid oh are concerned: these are respectively 

•f 1 1 -r 9 1 ) j 7 - 1 or In = Aft. Thus the second-order and 

zero if and only if 2 8 = / 9 , h = X 8 , H - A s or ® A n rp, 

all first-order interactions can be tested by UMP similar ^ - t es s > using 

situation is different for the other variances o\> o% an( ^ °L‘ . . 

Consider <r|, for example, contained in 3, H 0 : 4. - 0 cannot be expre^ed m 
terms of the equality of two Xj. Instead we observe that 3 2 +3 8 >3 5 +3« and that 





(36.41) 
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„ Hierarchical ^ ti0n 

2 0 BaUncd W°' w 

a r the model _ 0 +*^+%"^ p 1 2 • • ■ ’ ^ 

,. .. S; 1 - ' ’, „ f classification, (the ' The AV table 

ling to k groups at the first ^ of the kl sub '^Q n '(*) an d we need • 
of these, all * obse He necessary changes in "»« ■ We n 

' f *tfthof its MS (acept thejnet ^ ^ ^ .. 

tot e (36 P 42) h a degenerate for ® ° f fj' 2’o;,<r! for our present purposes. 

eolahpl «... as Ul. We also write 4 and ^iic as ^i> ^2 tota i multiplicity 


relabel as 


=d value of «y.y“U in wh ich we neeo out, put 
!) is a degenerate form of 1™. b ff | forolir present purposes. 

HToH 36 d 33 ) i nre equal, with a total multiplicity 
ttifr. for sub-groups. (36.33) now gives: 

£(MS) 

Groups: o*+mal+lmo\ | 

Sub-groups: er^ + wza| > (36.43) 

Residual: 


Evidently, UMP similar tests of aj - 0 and of o| = 0 are available from 36.17 


E*m,l, ».S Balanced tlirie-my hierarchical classification 

becomes (the reader is kf "tTrC'aT&eS ‘ PreV ' 0US Exam P le > (36.39) 



'•groups, ’ P "' llCUlarly ,h »< ' in Example 35.S 


corresponds to kl hero, the 
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Groups: 
Sub-groups: 
Sub-sub-groups: 
Residual: 


a\ + ma\ + / 3 ma\ 4 - / 2 h ma \ 1 

G e + WG\ + ^3 WZCTg 


(36.44) 


fff + ma\ 


i . are I groups each containing l , sub-groups, each of which 
H f e ’ub groups, and m observations in each of the latter, maknig n problem 

SUb ^; g Ir P e is no difficulty in obtaining tests by the »effiodo 36.17 ^hep ^ 

1 • vT orncse for higher-way cross-classifications in 36.1V was p 
it of interactions, which do not enter into the present ^ffiemchical case with 
P gcheffe (1959) gives an extended treatment of the three-way 


possibly unequal numbers. 


« of tests, confidence intervals and negative estimates 

* 21 By 6.17, a UMP similar F-test for the hypothesis that some vartance s 
J is equivalent to testing : * = 0 in -■*, + ♦ agamst H.: *>0. The ratto 
of S /(A +<f>) t° S p/K alwa Y s has a central ^-distribution, so that 

9 P ~ ' “ ’ (36.45) 


This leads immediately to the power of the test of H, based on the statistic S t /S p , 
Exercise 36.1 being the simplest case. 


36 22 Whereas in Model I we were led to consider multiple comparisons between 
the parameters (means) of the model, the natural next step in Model II is to consider 

confidence intervals for the variance parameters. 

(36.45) leads immediately to confidence intervals for the parameter <£/A p , of whic 
Exercise 36 11 is the simplest case. These intervals may cover (or even consist entirely 
of) negative values, as even that simplest case shows. This runs parallel to the possible 
negativeness of the point estimators of variances which we found tn Examples 36.3-4. 
For practical purposes, a negative point estimate of a non-negative quantity is inadmiss¬ 
ible, and is therefore usually replaced by the estimate zero (although this removes the 
..ohUsedness of the estimator). Similarly, the negative portion of a confidence interval 

is usually replaced by the value zero. , 

Thompson (1962) gives an algorithm for obtaining non-negative estimates of vari¬ 
ances which gives intuitively acceptable results in the one-way and two-way cross¬ 
classifications, but becomes complicated even in the three-way case. 

Bulmer (1957)—see also Scheffe (1959)—obtains approximate confidence intervals 
for <j> itself in (36.45), rather than the less useful intervals for c/>/A p already mentioned 
(cf. Exercise 36.15). 

There is no difficulty in constructing confidence intervals for the error variance of 
from the distribution of the Residual SS in every case; these are never negative. 
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which may be verified by multiplication. Thus V" 1 contains the V" 1 along its leading 
diagonal, and zeros otherwise. 


36.25 The logarithm of the LF for the one-way classification is, from 36.24 

J f 9 . / -t \ o .. 7 > 
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* „ y. /», it is obvious from (36.46) that the (p +1) statistics (S {ju, ^ ^ 
SinceX. i •>* ^ suffident for th thre e parameters o^e Ptob 

»*> 

thrM f 36.11 for the balanced case. These are, of course, essentially the quan 

result o *y table discussed in Example 36.3. . . , „„fc r ipnt 

1 » -t all equal, (36.47) does not hold, and the minimal sufficient 

« has more than three components (cf. Exercise 36.12). ^ 

SW 36.26 The AV is severely affected by the lack of balance. The “between groups 

= J. n ^ y , _ y y is no longer distributed as a multiple of a chi-squared variable, 

S . *“£ a weighted sum of squares about their mean of normal variables with zero 
^eans but unequal variances. However, the distribution of the Residual SS S s 

unchanged, so that E{S z /(n-p )} = of C 36 - 48 ) 

^ Onecan still estimate of from the AV table, but this is no longer a unique 
orocedure as it was in the balanced case of Example 36.3. We saw in 36.25 that S, 
and the p group means are always sufficient statistics. Consider the function of e 


group 


means 


S (m % ) = m { yt - ( m^j yl , 


where the are constants. Since, from Example 36.3, 

varjY = of + of/w^, 


we see that E{S(m x )) = 1 E m<. 


1 1 


rii n 




(36.49) 

(36.50) 


(36.50) can be solved with (36.48) to give an unbiassed estimator of of, whatever the 
m . US ed, and we thus obtain a multiplicity of unbiassed estimators of of (except in 
the balanced case (cf. Exercise 36.13)). The “ natural ” choice m i = w { , which reduces 
S(mi) to S 2 , is convenient but has nothing else in general to recommend it. 

This lack of uniqueness demonstrates that (as usual when the dimension of the 
vector of sufficient statistics exceeds the number of parameters) we have lost the com¬ 
pleteness of the sufficient statistic in the unbalanced case. 

36.27 Tukey (1956-7) considered the problem of optimum estimation in the 
unbalanced one-way classification, with complicated results which should be consulted. 
Searle (1958) and Low (1964) investigated the unbalanced two-way cross-classification, 
where again the SS in the AV table are no longer chi-squared multiples. Henderson 
(1953) considered several methods of unbiassed estimation in the general unbalanced 
case. 
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Gen ;.28 Both the LS^ - mo** jle 0 the i. e . where 

S^eMthet models is aP (36-51) , 

in uncord fixture oft para meter-vector 6 1 

* re ; - * from * he p 
j f r the general me 

.here9, la noW “* • f A V situations (which 

of constants- t0 discussion n) we can 

fine ourselves from the d untl l 36.13 c s ;der for dlus- 

36.29 “/'"fchapter 35 for (36 . 5 1) may arise- 

we did not do » ^ >( mode i s of the of five different 

easily see how ss _ c l a ssification- ^ breaking strong t are to be 

tration a two-w y • ent is to investig 1 0 f moisture co lumns 

Suppose that an expc difiereni i<- a nd tnre 

typ « of paper when we. - d ^ give . table wrth to^ for the expen- 

Sd with each VP' ofPf The five types of paper h*« these papers behave 

for the results of the tests- interest _ we want to tno c(jnstants (parameters) 

me „t because they are o 1 d .heir hreahng stren£ as the row . 

It is therefore reasonable to mg« ^ of det erminat.on. 

subject to the usua ex j . quite appropriate. three levels of 

“rrr“"- s.» , 

moisture-content will probity '^ b being nothing sacrosanct about the . 

“ Ugh, ” “ medium and low conte , ^ ^ ^ have been chosen f rom a 

precise levels used. In this sense, t . / not necessarily probabilistic) 

Jopulafionotpoientialleveho "„dTtoribution, quite apart from 

^ ri JeSt"ras 1 column-classification is concerned therefore, 
Model II (without the normality assumption) is a more reasonable idealization o he 
experiment, though by no means a perfect one. We are therefore led in the first 
instance to represent this experiment by a model of the form (36.51), with 0 standing 
for row-effects and u for column-effects. 


36.30 We may, further, consider the interactions of rows and columns in the 
experiment of 36.29, i.e., as in 35.18, allow for the possibility that the five types of 
paper have different relative patterns of breaking strengths at the different levels of 
moisture-content. Since the column-effects are themselves idealized as random 
variables it seems logically necessary that their interactions with the rows must also 

in 36M) ‘to h Tr* thim 38 C ° nStantS - We easil B achieve thi * W allowing u . 
n (36.51) to have further components to represent the interaotinnu T 4 u- * 


we 
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, introduction of the normality assumption into the model then implies 
gjj» that the imruu 

rC tpendencel' ,. indepe ndence of random effects and interactions can hardly 

On g f ® un ^ jylodel II, while retaining its mathematical interest, seems unli te y 

be alio’wed, an practical situations involving interactions 

d0 full justice to ma y y 


„ The fac t that Model II AV is identified with the generally unacceptable 
36*3 independent interactions, as we shall now call it, stimulated the eve op 

assumption general A y models w hich cou ld be freed from this assumption. 

ment u° f u cee that models with tied interactions, i.e. interactions correlated with ^ 

W e S ndinff main effects, do indeed lead to a different AV procedure from t a o 
eorrcspo Model II. Recent expositions, with some historical detail, are given 

eith S a ^kett (I960) and Scheffd (1956b). . 

by We now examine AV models developed by Cornfield and Tukey (1956) (following 
.. unpublished work by these authors), Scheffe (1956a), and by Wilk and K e ™P" 
ear f ue (1955-6). We confine our discussion to the two-way cross-classification, whic 
S all the important features of the models. 


A general model . 

36 32 Suppose that we have a population (discrete or continuous) of possiDie 
, e | s f or ^ row-classification, from which r levels are selected for use in an experi¬ 
ment and call this population P R . Similarly, suppose that there is a population P c 
^possible levels for the column-classification, from which the c levels used are selected. 
The selection process for rows is assumed independent of that for columns. If row- 
level i and column-level j are selected, observations are to be made on this combina¬ 
tion. The pth observation in the it h row and jth column will be written y ijv as before, 

and we let 

yij p Pijp + \ ’ ) 

where E(e i } ) = 0. We leave aside for the moment the detailed consideration of how 
the n - required experimental units are to be allocated to the (i,j)th selected row-column 
combination (see 36.36 and 36.39 below), but we let N i4 denote the number of experi¬ 
mental units which could (by the structure of the experiment) so be allocated, and 
define to be the average value of fi ijp calculated over its N tj possible values. Further, 
and are averages of ^ over all the levels in P c and all the levels in P R respec¬ 
tively, while is an average of over both P R and P v . No assumption of normality 
or homoscedasticity is made. 


36.33 The general mean, row-effects, column-effects and row-column interactions 
are now respectively defined, by analogy with (35.31), as 


9«« = 


0* 




0 # 


= Pi** P*** 

= H'ij* Pi** P*j* P*** 

= Pij* ~~ ~ 


(36.53) 
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case. 


(36.55) 


rpecialiae the general model in various ways. 


36.34 We now s 

^ P R has it = r members only, and P c has C = c members only, and no sampling 

of row- and column-levels takes place; iVy->°° for all *» •/; . , . , . , , 

This is the basis for Model I AY in Chapter 35. (36. 53) becomes identical with ) 

(35.31). 

Case 2 

Both P R and P c are continuous populations, so that R and C —> go, as does N.~ 
for all i,j. 

This is the basis for Model II AV in this chapter, although we have here made 
no assumption of normality or homoscedasticity. The last equation of (36.53) gives 

Pa* = 0 **+ 0 * *++ 0 ^-, (36 561 

and all the terms on the right except are random variables. The interaction, n 
are now effectively made independent of the 0 . and 0 hv th. V ! ° ij 

population from which they are sampled. In hot (26 m ^ mteness of the 

changes of notation, to (36.28) (written for v in ft , 1S ec l uivalent . apart from 
*»» y«,). } 1 *° r y * lnstead of individual observa- 

Case 3 

This is a mode, j “ “ m P °- P '° f the row Sections, 


b “^C:^^d!^*'-r andom 


interactions Tt , a mcatl0ns have random a. 

• II reduces to Case 2 whenfi c^nd’fV. " M ° dd H ’ 
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hould be noted that even when r = R, c = C and N {j —> co, Case 3 is not 
. S i c ase 1 above, since there was no sampling at all in Case 1, whereas 
identic* j- n g p roce ss determines the order in which the r rows and c columns are 
f j 2 , r and 1, 2, . . . , c respectively. The sampling process effects 

k* 56 domization of the rows and (independently) of the columns, of the rxc table. 
\Ve shall call this Case 3(1). 


I^ixed models 

36.35 Case 4 

In Case 3 above, suppose that R = r, so that there is only a permutation of rows, 
which are otherwise fixed, but that, as in Case 2, C and N {j —► oo. 

This is a mixed model, with row-levels fixed apart from permutations but the 
column-levels selected from an infinite population. The column-vector > with 

r components obtained by giving i the values 1, 2, . . . , r, is a random vector because 
of the selection of columns. For different j , the {/t#,} are assumed mutually inde¬ 
pendent with the same multivariate distribution. However, this model is not as 
general as might at first appear, for the row-permutation process implies that the 
variances of the elements of are all equal, and similarly that their \r{r— 1) 

covariances are equal. This condition of complete symmetry is clearly not always 
desirable in practice. We therefore consider a further generalization, due to Scheffe 
(1956a), which we shall call Case 5. 

Case 5 

R = r as in Case 1, with no sampling, C— ► oo, and as in Case 4, the are identi- 
/ cally distributed random vectors. The dispersion matrix of their r components no 
longer has complete symmetry necessarily imposed upon it by a row-permutation 
process as in Case 4. This is therefore a more general mixed model than that of 
Case 4. 

Imhof (1960) generalizes Case 5 to the balanced three-way classification with two 
random (and one fixed) classification variables. 


36.36 Cornfield and Tukey (1956) give a detailed treatment of Case 3 of 36.34 
(of which Cases 2 and 4 are specializations) not only for the two-way cross-classification 
to which we are confining ourselves, but for general balanced classifications. In all 
cases, they assume that for each selected row-column combination the n i3 - observations 
are made upon experimental units selected at random without replacement from a 
distinct population of Ny experimental units, and that these rc separate populations 
are all sampled independently of the row- and column-levels selections already dis¬ 
cussed. For the balanced two-way cross-classification with all n = m and also all 
V Nq = N, their results for the expected MS in the AV table are given in the first two 
columns of (36.57). The remaining three columns specialize these results as indicated 
the following page. 


ft 




On the other hand, despite the dist _ identical with the Model I 

** dieentr ies , the C« W ^mn chi-squared variate 

results in Example 35.2 at (35.65), wh narameter and its dir. and use the 

fcJriomSwith fiTc = C. The randomization of rows and columns in 
Case 3(1) does not affect the expected MS because the SS are symmetric in row-levels 


yr 

1 

/ 


and in column-levels. 


36.37 The entries under Case 4 in (36.57) are new to us, and we see that the 

expected MS for Rows (which are fixed apart from permutation) is identical with its 

value in Model II (where row-effects are random), while the expected MS for Columns 

(which are random here) is identical with its value in Model I (where column-effects 

are fixed) In fact, the expected MS for Rows is determined by the samnlinp in the 
columns classification, and vice versa Thp Rm™ mc • y sampling in the 

between row-levels indlteS,! o' T. rj® ,‘ S Concemed with differences 
with only a sample of column-levels^thV Don lY"* - S °' ,served > but in association 

column-levels is therefore relevant and’/ anneY J,a ! 10n lnteract > on between row- and t 
eample of column-levels, howevei is asMcimert '! ex P ected MS. The observed t 

6 in moie complex classi- j 
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c -Hons—the sampling of levels of the other classifications than the one under con¬ 
sideration determines the structure of the latter’s expected MS. 


36.38 The expected MS in Case 5 are the same as those given for Case 4 in (36.57). 
As in the Case 3(1) discussion at the end of 36.36, this is because the SS are symmetric 
functions of the rows. However, an important distinction between Cases 4 and 5 
rises as soon as we introduce the multinormality assumption upon the {jEty*}. It 
then app ears ( see Scheffe (1956a, 1959) for a detailed discussion) that the mixed model 
c Case 5 does not retain the simplicity of Model II, where we found (cf. 36.15-17) 


of Case 5 does nor retain me simplicity ot Model 11, where we found (cf. 36.15-17) 

that the ratio of each SS in the AY table to its expected MS has a central chi-squared 
cm H as a con seem enrp wprp _ 4 ._ 4 .~ 


fliV/ 1 WVV/V4 H1AJ 11UO U VVlltl Ul Vlll 

distribution, and as a consequence were able to obtain UMP similar F-tests by testing 
t tv/tc acrainst another with the samp PYnAPfofirm ^ remains 


QlSiriL' ut J - x - v ^ wwj.ii. ommai jl 

each MS against another with the same expectation on the hypothesis • JLl lLUiaino 
true in Case 5, as the last column of (36.57) suggests, that (if m> 1) o\ c = 0 may be 
tested by an F-test on Interactions MS/Residual MS, and o% = 0 may be tested 
similarly- But the statistic Rows MS/Residual MS does not in general have an F- 
distribution, even though its numerator and denominator are independent with the 
same expectations when a\ = 0. Scheffe (1956a, 1959) gives an alternative test 
statistic distributed in the Hotelling’s T 2 form to be discussed generally in 41.15-17. 
In Case 4, however, the statistic Rows MS/Residual MS remains distributed in the 
variance-ratio form—the symmetry saves the distribution. 



S. N. Roy and Cobb (1960) consider the mixed model (no interactions) with normal 
errors and one or more random effects which are non-normally distributed. 

We mention briefly that some sequential AV procedures have been developed by 
D. R. Cox (1952), Johnson (1953—4) and Ghosh (1964), whose paper should be consulted 
for further references in this field. 


Allocation of experimental units: randomization 

36.39 Despite the complications into which the proliferation of AY models has 
led us, we have not even begun to consider an important source of variability in most 
experimental data. 

In 36.32 we left aside the question of the allocation of experimental units to the 
various row-column combinations to be used, and we saw in 36.36 that the general 
model there designated as Case 3 assumed that the n units allocated to a selected 
row-column combination come from a distinct population of N i} units, so that there are 
rc populations of experimental units. This is an extreme situation—the populations 
of units do not overlap at all. 

At the other extreme is the situation where all the experimental units to be used 
(e.g. rem in the balanced two-way classification) are selected at random without replace¬ 
ment from a single population of N units. Here N ^ = N for all i, j, and there is 
complete overlap, so to speak. This method of allocation is called complete randomiza¬ 
tion and any experiment employing it is a completely randomized experiment. 

There are also obviously intermediate situations of partial overlap, where groups 
of row-column combinations share the same population of experimental units, and 
there are more than one but less than rc such populations. These would still be called 
randomized allocations (though not “completely randomized”). For example, each 
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36.41 Randomization of experiment, u, mts ahouj^ com ^ lications> and we need 

int he analysis of the data, but this leads to 

some new definitions nossibility that /%,, defined at (dt).i)Z; e 

Explicitly, we wish to allow for^theports M combinat ion, may itself 
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tains M experimental units, M>m. We now make a formal two-way classification 
of group-members against units. We emphasize that (i,j) is now being treated as a 
single classification by bracketing these suffixes on the right of the identity 

Np - i “ m *+{/*«)» _ /“(«)*} + ~ !“(*)*} + {Pmp ~ Pan* “ +/“(*)*}» ( 36 . 58 ) 

= i“(«> + W)p ~ /“(*)*} + WjOp “ t*(a)* ~ IH*)p +;“(*)*}» (36.59) 

where asterisks denote averaging as before. 

(36.58) resolves ft ijp into a “ general mean,” two “ main effects,” and an “ inter- 
action.” If both terms in braces in (36.59) are identically zero, the allocation of 
experimental units to row-column combinations is irrelevant to u for th^n ,*t t 
its average over all units. The first term in braces in 136 591 ft’ii a u CqUaIS 
of the experimental unit concerned. The second term in ^ ® d the Umt error 

the interactive error for the experimental unit and i races there will be called 

(More usually it is called the unit-treatment interaclon J 1 ™" C ° mbination “ncerned. 

on the Underlying two wavTl 0 " f ‘‘- p *" t0 t *’ ree com ponents at (36 591 i 
complicated. The mJr! f C a “' ficat,on sa heme set out in 36 3 f> ‘^perposed 

error Jerm f n ( becom “ 

11 from the unit and i Z v' 52 * ^ whlch We now call the u h Carrles suffixes 

estimation of the various e "° rs define d above) , . echnical error, to distinguish 

model!' ' eadS “ S t0 eXpect hifficulties'in 

or cilone is an *_ % 
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sc arising from inaccurate measurement or observation. The unit and interactive 
errors arise purely from the allocation of experimental units to the row-column com¬ 
binations. 

36.43 Wilk and Kempthorne (1955—6) discuss the one-, two- and three-way cross¬ 
classification in Case 3 of 36.34, including the unbalanced case, when there is complete 
randomization of experimental units. For the case of proportional frequencies, an 
orthogonal AV is always possible (cf. 35.21-2). For general (non-proportional) fre¬ 
quencies, a non-orthogonal AV using unweighted means of levels is used (cf. 35.31). 

The difficulties anticipated at the end of 36.42 duly arise—only certain functions of 
the parameters can be estimated. Moreover, as Plackett (1960) remarks, it is hardly 
natural to regard unequal frequencies as fixed numbers when row- and column- 
levels are being sampled; however, the addition of yet another sampling process to 
determine the w# (which might also be correlated with the observed values of y) would 
complicate the analysis even further. 

36.44 We shall defer further discussion of randomization models until Chapter 38, 
where we shall examine their rationale. We have examined their effects on AV pro¬ 
cedures sufficiently closely for our present purpose, and we conclude this chapter with 
some discussion of the implications of its contents. 

The choice of an AV model 

36.45 The plethora of models now available for AV presents the applied statistician 
with a problem which, in less acute forms, arises in the use of statistical techniques 
generally. Evidently, careful analysis of the known facts concerning the origins of the 

J observations must be undertaken before a model can be chosen which reasonably 
represents the real-life situation; and where there is little such knowledge, a good deal 
of guesswork may be necessary. In this respect, the statistician is experiencing a 
situation familiar in almost every field of applied science. 

It is worth pointing out that the varieties of AV discussed in 36.32-8 differ in their 
assumptions about the methods of selection of the levels of the factors being analysed, 
and not in any assumptions about the real nature of these factors or of the variables 
underlying them. Provided that the data arise from a designed experiment, the 
assumptions are concerned with the behaviour of the experimentalist rather than that 
of his material. On the other hand, the complications of 36.39-43 are essentially 
concerned with the nature of the material being experimented on. It must always 
be a matter for the experimenter to judge whether his experimental units differ enough 
to make these added complications in the analysis worth while—in social, agricultural 
and biological work they sometimes do, and in physical and industrial experimentation 
they often do not. Our present point is that, even when the observations arise by 
deliberate design, there is ample scope for intuitive skill in such judgements. A fortiori, 
% when the observations are not the result of a designed experiment, the validity of the 
chosen analysis will depend on the insight of the statistician. 

36.46 It will be clear, then, that AV, like other statistical techniques, is not a mill 
which will grind out results automatically without care or forethought on the part of 









atrsr- ' •* »•; c- i --'i ..• . . 

yStA •:••*•* > . -.j. /•: - 


statistics 

.nVANCED THEORY O ■ ents which can b e 

thE ADVAH nt of dehcate ^ as hard work .„ u Se . 

jf i s rather, an requ ires skiU, been ) applied to prov c 

tnc =« tisti f" s e when aPP^Tthough they somet^s ^ statxstictan rnu st 

brought »<® ues need not be (th ? pec tion from th . ppro priate analyse 

Elaborate techn«ju« obvtous o m P ^ gqu«U y, ^ of stati stics, b 

SHf""" 

T thT!togi^ tatled en,huS,aSt - con cerned with AV techniques, 

, hfL i as t of our thiee tun a y may be used 

36.47 In the next chapter^ ^ trans f orm ing the , da f ta di gt ri bution-free methods i n 
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exercises 

36.1 In Example 36.3, show that the power function of the size-a test 
against Hi : <*1 > 0 is 


Pioi) = l-G\ (l+«i 


°1 

at, 


G a 


where C is the distribution funchon of thewi* appropm*' degrees* 

«Tni—icin ,?:^ C “., e .haT.heLt is unbiased, and that itjs consistent as group 
size ^ increases to infinity, but not if the number of groups p alone -> oo. 

36.2 In 36.17, show that since y is a component of T, the term exp { — iny 2 /p(P)} in (36.35) 
will not affect the UMP similar test of H 0 against H r even if p{X) is a constant multiple of ?. p 
or 3 q . 

t 

36.3 In (36.26), show that the sum of all n latent roots, A b + 2 ).j nj, equals the 

j= 2 

the variances of the n observations. Hence show in Example 36.5 that l\ = A,. 

UMPU si™! ‘ he UMP S,mikr ° ne ' Sided i?_ * eStS ° f 3647 are unbiassed , and are thus 

itirr 36 ’ 5 . For the balanced one-way classification in Examnl^c ui , r , 

ML estimators of A 2 and A 3 are P eS ail( * ^6.5, show that the 
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never the test statistic F<—^— . 
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. nf F Verify that the 

P ^ i is a monotone decreasing function 


p * ^ 15 j a u* v **- 

further that for p - 1 d _i_ for «< 0-25 , so that for all practical purp 


Sho^ 

critical^ 8 
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•valent to the UMP similar F-test. (Herbach, 1959) 

1% test is eqU1 


„ the balanced two-way croas-ctai«cation ^ UMP ^ “ 

"* eq " (Herbach - ,959> 
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ShOW also a X 2 variable with d.fr. 


were 
estimated by 


, = X 2 /Z(cfAj/vj), 


/ = sv^syv]). 


• fp "p-test of (36.41) may be based on the ratio 
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36 , Verify (36.39) and also (36.44) in Example 36.8. 

3 ,,o in Exampie 36.3, show that die variance of ,he unbiassed estimator of oi , ,ven y 
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2<t|' 


oo. (Cf. Exercise 


■ ^nci'crpnt as t> —> oo, but not if tn alone 

and hence that the estimator is consistent as P 

36.1, where test consistency require » ?) g . yes general expressions for 

variances and covariances of estimators of variances.) 


. , • for r? an d for a\/ai from the distributions 

36.11 Iri Pirampie ^btiun confidenceunlatter intervals may be partly or wholly below 

of S 3 and SJS 3 respectively. Show that tne ia 

the value zero. 


rr value 

• • 1 ffioinnf statistic for the three parameters has 

36.12 In 36.25, show that the minimal sufficient a 
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Writing 
where 


!7t- 


/*(y) = n {1 +(y-l)V^> 


-9 k 
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(Box, 1954) 


36.15 If SAi + Q and SJl are independent *> variables with A, £.fr. /^nfpoim 1 
Af< = T = MJMi (the variance-ratio statistic), and F a j 2 is the lUUa pel c t point 

of the distribution function of F with (/i, ft) d.fr., show that if 

P{M 2 g(F)<<f>} = a 

for a monotone increasing function g(F), it should satisfy the conditions 

g (F a , f) = 0, 

g{F) ~ F/F a> M as F->- oo. 

Show that these are satisfied by 

ii<& = (F-F* lfi )/F a , „ 


and by 


gAF) = (F/F a ,„)-l+(F., f ,/F){l-(F a , ft /F a , „)}. . 

(Bulmer (1957) showed that gl is a poor, and g 2 a re- 
ta y good, approximation—see also Scheffe (1959).) 





CHAPTER 37 


ASSUMPTIONS of the analysis of variance 

When it has been decided that a particular model is appropriate to a given 
^ • n important problem remains for consideration. Although natural con- 
situation, a convenience or technique may dictate that the observations be made on 
siderations ^ tQ decided which function of y is to be used for the purpose 

a varia rp^ ere - s n0 reas0 n why the quantity measured, rather than some 

0 f the an ^ be best suited to the assumptions of the model. 

fUn ?we may indeed, be compelling practical reasons for the consideration of a par- 
• If function of y, say g(y) (which may simply be y itself): for example g(y) may 
? 1rt „ lv related to the cost or the profitability of a process under investigation. But 
^ e . C . Hes only that the conclusions of the analysis should finally, for practical pur- 
t H be expressed in terms of g(y)’, it certainly does not justify the presumption 
[hat the model is better satisfied by g{y) than by any other function of y. 

37.2 Putting the problem slightly more formally, we may say that a set of observa¬ 
tions on v are, equally, a set of “ observations ” on any well-defined function g{y). 
The question is, which “ observations ” g{y) shall we use? Evidently, we must try to 

, determine the function which as nearly as possible satisfies the model. The search 
/ f or a transformation of this kind was first treated generally by Box and Cox (1964), 
whose investigation is generally applicable to the linear model (with normal errors) 
of which the AV Model I in Chapter 35 is a specialization. The reader will observe 
that the preceding introductory paragraphs are not restricted to the AV context, tor 
the problem is a general one. 

Transformations to the normal linear model 

37.3 Following Box and Cox (1964), suppose that we observe a dependent variable 
y and a set of regressor variables x lt x%, ... ■> x k \ and that we wish to employ t e inear 
model with normal errors. However, we are not prepared uncritically to assume that 

we may validly write 

y = X0 + £‘, 

rather, we seek transformations both of y and of each of the ^s so that we have 

y* = x^e+c, (37.1) 

k where the components of e are independently normal with zero means and constant 
variance a 2 . In (37.1), X = (A lf K • • • ) indexes the transformation of y within some 
selected parametric family of transformations, and similarly [x — (/^i> ^ 2 > • • • iPk ) in e 
the (separate) transformations of the regressors #!, x iy . . . , ^/r e are * us gener 
izing our introductory discussion, where only transformation o y was envisage 
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( 37 . 2 ), 


H^' ; i <■”£-. *;£«» x £E2 , 


( 37 . 3 ) 


, j,theJ aC 0 B Ae”actu^ oD T;r w 'e find 35 before, 

*"*>)» b ; ct to e *» d * constan ,s, ther 

variab \ a, w‘ th r \ Tf we neg le 

f° rgiVen mes a constant- . \ 0 o]» 

ternr beco ^ ^ p,, log ^ . 2 4 28- . m 

m „iroum . £ (y|M) ain as m 241 . ; a ( maxima 37 . 3 ) 

e ■ the Residual SS, ag he condium compu ting, a 

-j* -J - 1 i-H r- i3£~ 

involved, e.g. wh j s transformed 

(a) only the dependent var subset 0 f, the regressors, so that 

ponents; or . . g a ppii e d to all of, 

(t) W ° C ° m ^ nt andW holds with only one component in p. 

(c) X has a single component as i» ( ), of (37 . 2 ) for all X, P will 

IU\ nnrl (c) numerical plotting of the w here only the dependent . 

In ,rte w! now confine ourselves to 0 -l variables 

generally be necessary. Tn AV oroblems, where the regressor& <u . 

variable is being transformed Inp " ifed ’ data> this is not a restriction of conse- / 

(cf. 35.9-10) since we are dealing with choose proper forms 

uence. In more general regression the dependent variable, 

fnr thp regressor variab es before considering transformation o F . f 

Box and Sell (1962) discuss transformations of the regressors to ampler form 
[cf. Exercise 37.9); such transformations do not, of course, affect the normality or 
iomoscedasticity of the errors. 


37.5 Returning, therefore, to the purpose outlined in our initial discussion, we 
consider transformations of y alone. In practice, the most useful transformations have 
been found to be the powers and the logarithm of y, possibly translated by a constant. 
We therefore consider the family of transformations 

j* = {y+h)\ K * Od 

„ .. = I °g(y-M 2 ), K = o.j ( 37 - 4 ) 

To avord a drscontinnity at 2, = 0, we rewrite this equivalently as 

y * = {( y + hY '- 1}/2„ a, *0,-, 

Tub ~ 1 °g(:V+3 ! ), A, = 0./ (37.5) 

cnt.al equation which it satisfies, 

(W/y»)'} = (A,—l)-i. 
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, (1962) give tables to facilitate fractional power 

° f °' 2 - 


In (37.3), we now have 


log A = ^logCyt+W, 


when 


(37.6) 


be plotted for selected (A„ « for the Residual SS 

and (37.3) £v must be carried out for e*MA fay equa ting to aero 
* ‘ be Using (37.5-6), this grves 

‘thifirot derrvatrve o ) ^ } _ S log*, ( ' 

o = -. ax, y>, T y'- 

„ h ere the elements of u are {*** *>*■ 

U ——affiS) present some interesting numerical erf££ 

377 B of to method of finding a -"tC'They consLrto resoluIon of 

■PP®*®?®? analysis which they develop. In ad ^° ’ * the nor mality, the homo¬ 
method . three components corr p Their procedure is of general 

SK» c„n.»b.*-• JtSKl 

^Consider sets of constra R of A when all of C v C „ • • • , u. “ 

-ft-J ML «*— — » — ‘ 

applied. A wuu . « x 

’ identically for any *, _ j,(y | ( (1) ) L(y | A<») 

i(y|i w ) = i(yi 3 )-i^A)-L(yTi(.))''' ‘ (37g) 

= L(y\ X).lih • • • n r r 

• • c +r>o+tr«T the set of constraints C 1? c 2 , • • • > '■"p -1 ’ p 
where I p is the LR test statistic act ^ ^ Each 0 f the l p lies between 0 and 1, 

against the set C„ C 2 ,. • •. n-it , is asymp totically a non-central % vanabl 

and under regularity condition , y constra i nts upon parameters imposed 

with dir. equal to the I " m ' > Ws becomes a central x 2 variable, and thus -2 log l p 
by C p (cf. 24.7). When p ’ q t0 the already imposed Ca, C 2 , ... , C »-t- 

may be used to test the value of addin S , inde pendently distributed, though 

It should be observed that the l p are n g ai^hypotheses (cf. Exercises 24.6 

in particular cases they may be independent undei certai yp v n o£ the 

and 24.13, and the more general result Exercise 37.1, since 

resolution (37.8) to the present problem is ef . tc, to « ^ 

it follows immediately from some results gi P 

The purposes of transformation ^ ■ requires no 

37.8 The virtue of the ML approach discussed^ rf the 4 riaWre of 
prior knowledge of the relationship between y an assumption that 

the error distribution of the untransformed y. U « arts 


G 







grfjCS 

n »y OF sTA . ich all the conditio^ 
nVAN cED Tl1 nS id e r ed for !f the error distribution 
THE AD family c0 , jorinainy but even then, tl„. 

...nation >“ “.Jasticity a ; nd „ot be **’ an improvement) 

eds tsse roctra “ S '-Aine h0 ' I ' 0S ‘-«'irse, this ore sumably_. , fividen ced by t|„, 


. rma tion > n “‘‘Lticity » ni „ot be **’ an improvement 

' «ists som<= t«“ a , u(lin g hom° s f c coUfS e, th ‘® st pr esum3b fact (evidenced by ^ 
linear of * 1 rti cular e® es ’ formation a st rikmg formation is ofte, 

0 e Satisfied- In P choice of the » form- I (bat this ML * {be nature of th e j 

«• ^ Co :tlS be undertaken wherev et ! 

^fng the monotont 

very d° se ia bles. SuC ", u jde to the * et hod ot n , ult ably scaled) 

„„derly»8 var ument and g“ ute r-based n ;dua l fab f • d 

s ” d c " 


^ples, with several others. needs 0 f the linear 

, - ncr the data to meei or t o stabilize 

379 Other approaches toitransfe seek elWe r to normalme t^ ; ^ (he hope is 
, odel have been tes ^u. s0 that effects are ^ ^ ^ h elp towards 


ad el have be^ ^“eracti^s so «b* « ** help towards 

the goals of additivity 

s .0 be always so. It is e^y ,0 con ^ cro ss-classification the expected value 
d homoscedasticity eonftct, for tf n a ™ ^ ^ are non . n0 rmally distributed T 

r b a«iv« in r0 l™f any ttansformation to remove the heteroscedacity will , 

happen to the non-normality. 
wJ now examine these different types of transformation in turn. 


Variance-stabilizing transformations 

37.10 Suppose that a statistic t has mean 0 and variance, for fixed sample size n , 

var t = D\ (6). ( 37 . 9 ) 

To eliminate this dependence of variance on the parameter 0 , we seek a function 

u[t) such that var u is a constant, c. In general, however, we are unlikely to be able 
to achieve this precisely, so we ask only that J 

™ m = c{i+o(R-i)} (37 

where R is some known constant which is lar^ t r n 
particular we may have R == n , the sample sle We ^ ^ ' t0 be ne § H g ible - In 
to a neighbourhood of its mean fl Tl P W n0W assume t to be confined * 

have from (10.14) the alrZi, The argument of 10-6-7 then applies, “nd we ^ 


(10.14) the approximation 

var {«(/)} = S(Mfy\ 




U * > L. 


var t. 


( 37 . 11 ) 
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( 37 . 11 ) are equated, we have the first-order approximation 
If (3 7 ’ 10) ^ (fdu(t)y\ _ c (37.12) 

{{'dT) J t -o = DWY = 

considering only the neighbourhood of 0, we drop the suffix 
giflce we ^ Thus 

and wrlte du(d) ^ (37.13) 

dr0 p the constant „ without loss, since this is any case at choice formula- 
where WC f uU) by a constant will not affect our purpose of achieving (37. ). W ^ 

plication ° ( . 13) again ignoring the additive constant wh 

gmUon wLm loss, sine! (37.10) is unaffected. We obtam 
the indefinite ms , f d0 v (37.14) 

“(0 1 nail ' 


nn 141 was arrived at through approximation, we can check its 
37.11 Although (37.14) was am ^ ^ ^ of the theoretical 

validity if ' its sta b ility as 0 varies-it may be found desirable to modify 

variance of “( ) .... -where, on the other hand, we have only observations upon 

»(0 to 0 n,s distribution or of the parameter 6 of that distribution 

t and no prior k § _ 2 nrec : se i v j n S uch cases, the mean and variance of t 

" e Cann °!! e emtup°sTo!servations are calculated, and the latter plotted agrnnst the 
in separate gr P relationship (37-9), on which the transformation (37.14) 

-e hazardous, hut nevertheless often 

gives satisfactory results in practice. 

EX “mhl'L Poisson distribution, (5.20) shows that mean and variance are equal, 
so (37.9) is here simply ^ (#) _ var , _ 0 

and (37.14) gives ^ ^ ^ cc *, (37-15) 

a simple square-root transformation. To the first order, by (37.11), 

var(t*) = {(lH».-.vart = l, (37 ' 16) 


improwd in this 

case by re-locating t before taking the square root. If we define 

u c (t) = {t + c)*> 

Bartlett suggested using c = Exercise 37.15 shows * at c f * ti of its limiting 

The table on the following page made by Bartlett (1938) 

variance as 0 — > oo, for c = 0, 2 ancl s mc 
and Anscombe (1948). 
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, ani ple normal distribution, we know from our earlier work that if t 

~' J l/sai»pl es ft0 ™ e and o 2 the population variance, * = nt/{ 2<r 2 ) is a Gamm 

is tbe Sa >h^parameter |( w “ Le ‘ the distribution of * is 

variat e W1 ^(s) = -_ 1 


rfK”- 1 )} 

1 25) in different notation). The mean and variance of * are there ore 

fcf., e #’ \\ ’ iL-1), and those of t itself are 
v L pnual to 5V* r, „ 

eaCh q e = £( ( ) = ?£!. i («-l) = aHn-l)/», 

TL 


Dl(d) = vari = 1) — 2cr 4 (« l)/w 2 


so that here 

(37.14) gives 


Dl(6) = 2 6*/(n-\). 


(37.17) 


u(t) oc = log 

, , ave arrived at the simple logarithmic transformation. Since 

andwe log t = log £+log (2ff 2 /n), 

i ♦ 0 f incr t and of log 0 are identical apart from the constant difference in 
the cumulate of g ^ cumulants do not depend upon <r ! at all. The 

1 SSStSSJ’i ng»■>. p-u—n 

1 f •" 

P 1 


i (w \ = l ' e -* x v- 1+iw dx = T{p + iw)/T{p). 
W V{p) J o 

If p is integral (n is odd), this becomes , 

(p — l + iw) (p — 2 4- w) • • . (1+tzo)r(l+tw) 

^ w ) = - (p- 1) (F2)TTIT(r) 

p—i /„ fo(A 

= T(1 + z«0 n (1 + y J, 


so that the cumulant-generating function is 

Yj(ot) = log T(1 + iw) + ^ log ^1 +-)• ( 37 - 18 ) 

Now r(l-iro) is the c.f. of the extreme-value distribution (14.66), with cumulants 

(cf. Exercise 14.21) 

Kl = y (Euler’s constant, 0-577 . . .), 

00 

Kr = (r-l)!S s- r , r> 2 . 


(37.19) 


s=1 


Thus the cumulants of log 2 obtained from (37.18) aie 

j»-l 
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d r f(w) 
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/ ' / t0 normality °f.*he dls « ndnU p ton = 20 

at which point the a, 
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rntered an instance of this procedure ii 

e-stabilizing ^-transformation at ( 16 . 81 ) 

>7.3 below. 
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Wl + t®Y|- ! 


do) 


■ju < 37 - 26 > 

edure in Hotelling’s improved 
be j ow • 06.81) (Vol. 1)—cf. Exercises 

procedure could evidently be iterated further if this were j 

further applications of a sinele variant. ■ 
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already examined the “-Fishetm^d of obtain; 
,. t i r ■ „ polynomial transformation; and in 6.27-35, we discu J 

J *££&* “^r^StLslon of the limiting normality* 
<*£ (1943) glVeS p„ C ,hose discussed in our Examples and Exercises We shall 
Cations, “P^tct mentioned in 37.9, that a transformation designed to 
example ° f variance stabilization) often also helps to achieve another 
g ‘I vve one P ur P 0S * ( r addition, Exercise 37.16 treats the case dealt with in Examp 
we normaHz atlon) - occurs> However, the last of our examples will show that 

K> ^/TrDoses is only obtainable by not pressing for optimal achievement 
Ills ha rrn ° n 7 ° fP va rian C e- S tabilizing transformations commonly norma ize 
b oth do not produce the optimum normalization. 

product, u 

Staple 37.3 i6 3j F;sher , s variance - s ,abilizing transformation of the correla- 

We discusse , latter was seen at (16.74) to have variance 

■ the population correlation parameter. (37.14) applied to the leading term 

where p 1S luc * v 

of (37.27) gives „/1+A 


*M = i log 


n d the variance of z was seen at (16.77) to be 

1 /1 , 4 -P 2 \ + 0(n-) (37-28) 

= + 2 {r^T)r U{n 

i ding little upon p, so that variance stabilization is good. (16.78) showed that z 
tote skewness coefficient n of order as against order n- for r; y 2 is of orde 

"ifseemfclear that the variance stabilization symmetrizes, and hence normalizes, as 
by-product. 

Application of (37.26) here gives 

PP a- = *-(3*+r)/(4„) ( 37 - 29 ) 

■ith variance further stabilized at (n-l)- 1 +0(«- s ). 

Gamble £ 

We return to Example 37.2, where we saw at (37.22) that the variance-stabilized 
igarithmically transformed variable had 

yi = ~P~K 72 = 2 ^> 1 v • / 

lymptotically. The untransformed variable is seen from (16.6), with p if2, to have 

n = 2r‘. y, = fir 1 . (37 ' 31) 

1 that the variance stabilization, as by-product, has halved skewness (changing its 
gn) and reduced kurtosis by a factor of 3. 
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the AD yAN is the parameter. 

« 5 . , „ 2 is known, 'Ration as for the _IW 

gamp* it5 , „ 2 suppose now that in the saffl variance stabilization 

in Example 37-2- s “(^ p aid we are o forIIia tion. r 

h ^A" 14 ) g ^ a br^t(l 936 ) sh ° W: 9 o 150 

®»p' e lvalues given by Bar ^ ^ 40 9-0^ ^ 


f 


now 

in Ex: 
is good, as 


these ralne® S‘ ven * 

. 0 0-5 10 , 

var(P') 0 0 182 ”' 21 

,6.5-6. It has, by (!«•*>• 


,-1 


20 30 40 

z „ n ,10 0-242 

° the * 2 distribution, treated i n 

iroximation to the x 

(37.32) 

,6.5-6. it n», -a v - i# -*, y s - **"!’ , variab le (and better also 

nn o i\ f or the untransforme . Again, the 

distinct improvements over (i!731) ^ virtue the presen ) Wilson- 

than (37.30) for its log®*”'*f the normalization here. But note tn though 

variance stabilization has “P”^ J r om (16.13), even better y„ of ° r ^. e ^ ’j mat jon. 

Hilfertynormatot,«on67h ^ ^ gives t he better norm^ PP ^ ^ 

However, ^it°does'not stabilize^varimice X all , - 0* and the squa re-root 


\ 


37.14 The reader will see tha ^ { 

for granted values of ' n and n are equivalent to a closer approach 

to normality; but we know of no significant example where this assumption misleads j 
in choosing between normal approximations. 


us 


Transformations to additivity 

37.15 Although in practice it may be important to search for a scale on which 
effects are additive (i.e. interactions disappear) or nearly so, relatively little work has 
been done in this area as compared with normalization and variance stabilization. 
Some general procedures which have been proposed involve minimization, within a 
class of transformations, of the value of the test statistic used for the hypothesis that 
interactions are zero. In the two-way cross-classification, for example we could 
minimize S 3 (or S 3 /Sr) at (35.65) in the balanced case, or (35 69) in the nJ nf * i 
observation in each cell. I, will be recognized that the ffiK ‘ 
to carry out a complex estimation procedure and nntbf™ u ! lshe re being used 
has so far been given for this method. Such additivitv tn g f Ut ln . tultlve justification 
suggested by the analysis of residuals, which « ^ if jVTjO."" S ° metimeS 


be considered in this * 

honed its use in'nbt” ^ no ™ a * or( ^ tr statistics or nnr ° r ) matlon °f observations, via 
situation. The ProUt™ dV distribution-free test for SCOres ’ anc * ' n 31.71 men- 
he Probit and Logit ttansformations^if percent cl ^fieation AV 

Percentages, respectively to normal 
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. ibution deviates, arise mainly in biological contexts, and are discussed 
® • 

i transformation bias ,, , 

Hem° vaI w , teyer the purpose of a transformation, it often raises problems or 
37.17 ™ * tbe ana iy S is is complete. In particular, estimators of means or of 
presentation * afe un biassed on the transformed scale will not be so if the inverse 
differences w ^ ma d e s0 that results may be presented in “ natural ” terms (cf., e.g., 
transfo rma ^ l ° Adjustments of some kind must be made to remove the bias due to 
Exercise ■ )■ eral met h 0 d of bias-reduction was given in 17.10. We now 

«*** Z Tact method of removing the bias. , ^ 

discuss ^ u . g norma lly distributed with mean p, and variance o 2 , and that the 

SupP 0 S z /v), are jointly sufficient statistics for these parameters, fi being 

functions r ! bute( j w ^ t j 1 mean ^ an d variance P <r 2 , and S 2 /<r 2 , independent of /2, a 

^^riate with v d.fr. In practice, we usually have P = \/n and v = n -1 where n 
? | e s i ze . Now consider the function t(u), which in our terms is the inverse 
transformation. Neyman and Scott (1960) (cf. also the succeeding paper by Schmetterer 
(I960)) showed that if t(u) satisfies the second-order differential equation 
1 t"(u) = A + Bt(u) 

for constants A, B, the unique MV unbiassed estimator of the mean of the untrans¬ 
formed variable 0 = E(t) is given by 

t(fi) + A{ l-A 2 )S 2 /(2*), B = 0, 


6 = 


r 




l 

This series converges very rapidly, only a few terms usually being required for adequate 
accuracy. 

It follows that the bias of the crude estimator t(/T), which is simply the inverse 
transformation of /2, is 

Fk(rA-o\ B = o, 

{ W ' \{fl + (A/B)} [exp {—B(1 —X 2 )a 2 /2} — 1], B # 0, 
and its absolute value is always a monotone decreasing function of P. Since usually 
l 2 = 1/n, the bias will increase with sample size. 

The following are the most important special cases: 


Transformation 

«(0 

Inverse 

transformation 

m 

A 

B 

Bias 

w»-» 

Sign of bias 
when A<1 

(f+c)* 

P-c 

2 

0 

— (1 — A 2 )<r 2 

Negative 

,Q g(t+c) 

exp 

c 

1 

(6 + c)[exp { - (1 - A 2 )<r 2 /2} -1] 

- sgn(0 + c) 

arc sin («») 

sin 2 (Jx) 

2 

-4 

(0-|)[exp{2(l-A 2 )cr 2 }-l] 

sgn(6 - 

ar sinh (t‘) 

sinh 2 (/i) 

2 

4 

(0 + i)[exp { — 2(1 — A 2 )er 2 } — 1] 

— sgn(0+|) 


f Wl11 be seen that as X—> 0 {n —> oo), the bias for the square root transformation —> - a 2 . 






_ ’a’ ■ 

l ifV. -; -•; ‘ '•v.r '' ' 

-' •• 

] a *K 
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Y OF S^ 1 ^ 5 at (37.16)- T hc f 

ti- obtained bi as r jg ( 7, v 

may be 

-*r5£tVrstf ^ with a 6 (37 - 33) 


10 o +w 0 + € 


. (A r x /)) matrix and 

.and **??.&£ Excise 19.1) 
v - - yi-y> 


0 is gi^n by the mo 

'“^e confix »" selV 

written in the form J „ like £ «>“ „„ a j s (ct. ^ 

1 is a vector of unto, ° f “ “ duc ;„g a g en ^l by the deviations *< 

to replace, in the Lo 
forming a vector 


(37.34) 

y-ly'V*> . . , fro m the column means 

/ w b th e deviations from 

a „ daI so» replace d.ee, ffl en.s» tf of W^y._ 


forming a matrix X. 


Thus we have 
z'l 


z l X .* f r o m the beginning that the model is 
generality, therefore, hj a—front ? , 


(37.35) 


t 


We lose no 


.. and that of 9 is, by (19.12); ; 

m 35) holds. Then the LS estimator of 9„ y 

(37.35) hom § = (X 'X)->X'z. r£ . , , 

. .. J-, v' ,„H denote the vector of fitted values by 


where 


U . • m XfX'Xl- X' and denote the vector of fitted values by 
We define the matrix M = X(A \) a , an 

f = xe = Mz 

and the vector of residuals from the fitted model by 

r = z-f = (I-M)z. 

By (37.35), 

r'l = 0, 

and since M is idempotent, we also have 

Mr = M(I-M)z = 0. 

37.19 Now suppose that, after fitting the model, we construct a scatter diagram 
cf Example 26.7, Vol. 2) with the fitted values as abscissae and f " 


(37.37) 

(37.38) 


siduals as 
nmetric 


um, aicer nning me model, we construct a scatter diagram 
ipk 26.7, Vol. 2) with the fitted values as abscissae and the corresponding 

Further - m i\ 

.he residuals, and the regression lines' in the TcatteV ^7 “ U f C ,° rrelat , ed 


. .he residuals, and th7reer^sio„li ( 7 “ ^ fitted ValueS are ^correlated k 

«). regreSS10n llnes ln *e scatter diagram are at right angles 1 

hTinriiiio e sc f T diagram ’ we ma y use its 

assumptmns of the fitted model are * 


‘ a 6 icllu , we may use its 
the fitted model are satisfied. 
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, ^onioscedasticity assumption may be checked roughly in terms of the 
j n parti cU ^ ar f t ^ e res iduals for different sub-ranges of fitted values, and the normality 
jj s persi° n 0 ^ checked in the same way, especially so far as skewness is concerned. 

assiiiflP^ 011 ma ^ appropriate transformation may be made by the methods discussed 
j n each ca ^.’ g c hapter if the assumption is found to be inadequate. 
ea rlier ^ m0S t interesting use of the scatter diagram, however, is to check addi- 
perhap s . g * n multi-factor experiments. Non-additivity can manifest itself by 
tivity asSUI jj near ^ sa y > quadratic) regression of the residuals upon fitted values. 

evident non 

The rough visual methods described above can be translated into numerical 
37 ‘ AnSCO mbe (1961) proposed to use the statistics 
terms. A t - r' f 

I' nrr * n */» 


v “a 


is the vector of the ^th powers of the residuals (e.g. r defined above is r x ) 
*s the vector of ?th powers of the fitted values (f = f x ). t zo and f 40 are obvious 
1 3 'L of the usual measures of skewness and kurtosis. t 21 measures heteroscedas- 
analogues^ ^ essen tially gives the covariance of the squared residuals with the fitted 
ticity, sin ineasures n on-additivity on the lines indicated at the end of 37.19. In 
TTit is very closely related to the statistic S, used at (35.69) for testing additivity 
inV two-way cross-classification with one observation per cell—the numerator of Sj is 

just t\ 2 . 


37.21 Approximate sampling theory for the t pq (suitably standardized) is developed 
bv Anscombe (1961) and leads to approximations to the power transformations required 
* t0 y remove the corresponding departure from the model’s assumptions. In accordance 
/ with our discussion of 37.9, there is no guarantee that all these statistics will point to 
the same power transformation, but a general hope that they will not differ by much. 
In this connexion, it is interesting that Box and Cox (1964) (see also the discussion of 
their paper by Anscombe) expressed the ML solution (37.7) for the power transforma¬ 
tion approximately in terms of the t m with p + q = 3 and 4. In essence, the ML 
estimator carries out a kind of averaging process between the various power transforma¬ 
tions suggested by the individual measurements of heteroscedasticity, non-normality 
and non-additivity. It is not the least of its virtues that the ML approach automatically 
effects what might otherwise be a difficult compromise to make. 


The robustness of AV procedures 

37.22 We first consider estimation of the parameters in AV problems. Where 
Model I AV is concerned, LS estimation theory and its optimum properties (set out in 
19.4-9, Vol. 2) does not at all involve the assumption of normality for the errors. Thus 
all estimates remain valid, and so do their estimated vaiiances, in face of non norma lty. 
LS estimation is distribution-free to this extent. The normality assumption was re 
quired in 24.27-37 for hypothesis-testing and interval estimation purposes only. 

Further, even if the basic LS model (19.8) is incorrect in respect of its assumption 
of uncorrelated, homoscedastic errors, this will not bias the estimator ( . ), or 

(19.13-14) hold so long as the errors have zero means. But the MV properties are os 
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a mpED THEORY o Thus heteroscedas. 

THE ADVANCED ncv without importing b ia . 


THE ADVANCED estimat or ( 19 v 5 ^ th0U , importing bi^ 

»« „W pass to the true b g efficiency chapte r 36, it is easy 

cas e, since they n P errorS mere y ^ conS idere f a £l u re of the normality 

in . n d correlation 0 models w a ff e cted by . j i n the present 

For model IIAV, Mean Squares ^ “” ents reffla in ^‘f variances ^ 

to see that the* tima tors of variance ^ However,^ changed by ; 

n^n° r ' nalit f n le h sim^t case S iv “ ^Hfied iby the special relations etwee,,, 

ssSrss.'saa 

the cumulants or 


in -normau>-/, - „ nrma i disuiou^*- 

, e cumulants of the no Ot j ma tors) are concerned, We 

l nd the corresponding interval est ^ of non . n0 rmality 
»» */S “TT the outstanding general feature of ^ ^ on variances are 

iticed in 31.2-9. Vo> . • ts on mca ns are rob > , stimat es in Model I, 


»•» Vo'i? the outstanding general feature Jt ^ ^ ^ yari 

noticed in 31.2-9, ^ o • > tests on means are rob » , est i m ates in Model I, 

upon normal-theory proce • that tests and in n 0 n-normality; 

S. This generalization 1^ «£^ wiU be relatively robust to n ^ 

which is essentially concem^^ models, which a re .^°^ etail £ n 37.24-35, but 

and that those in ModU^ robusmess t0 non-normality » jn some detailed 

tel"lark r °tha U t ihese statemen^We beeyjbsmM y (WS9) . an earlier 

r“y°Coeton n ^^ l y 9 ^ , i^f(19S4)) also showed that in Model I 

when equal frequencies are use in a alsQ 32 4 )) The practical implication , 

"ng^ulTrZmLTone equal cell-freque'ncies should be used wherever / 
possible when the observations are designed. As a happy side-effect, computations are 
made much easier by this conclusion. Further, this robustness to heteroscedasticity 
in the balanced case permits us to make a simple approximate AV of cell means when 
ill frequencies are unequal (cf. Exercises 37.7-8). 

The effects of stochastic dependence among the errors can be extreme (Box, 1954). 
Phis recalls a general point made in 36.39-43, that randomization methods of allocating 
xperimental units (which may obviously introduce some dependencies among the 

rrors) should be taken into account in the analysis. We shall return to these methods 
1 Chapter 38. 


Robustness to non-normality in the linear model 

37.24 An interesting approach due to Box and Watson flQ 6 ?\ t 11 r 

work by Box a „d Andersen (1955), throws som gener ‘l ligh !i °™ g 
varying degrees of robustness to non-normality. ? P 16 r£aS ° nS f ° r 

attributable to Ae'fittedTILT^rrif gCneral defined in 37 - 18 ' The SS V 

and the Residual SS is - ff - z Mz, 

o 

= r'r = z'(I-M)z. 



i*'Av 
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are normally distributed, the LR test of 0 = 0 is based on 
When erro F = (S 0 /p){S B /(N-p- 1)}, 

, . t u e variance-ratio form with d.fr. v 1 = p,v 2 = N-p- 1. The test can 
be carried out on 


lently 


equiva 


for since 


W = 


z'Mz 


5 0 + 5, 


(37.39) 


(37.40) 


onotone increasing function of F. In the normal case, when 0 = 0, \/F has 
lt lS a r ^| nC e-ratio distribution with (N—p—\ } p) d.fr. and W is the Beta variable, with 
t h e van ^ (i(AT-/>—1), obtained from 1 [F by the transformation in 16.19. 
P an ^ j lQW study the distribution of W in the general (non-normal) case. Its de- 
' tor is invariant under permutation of the elements of z. We therefore first 
n °nsider this permutation distribution (cf. 31.16) of W. If the joint distribution of the 
Trents of z is symmetric in its N arguments, as will be so in particular when the 
errors are independently and identically distributed, each permutation of them has 

probability (AH) -1 . . 

v Q nce W e have obtained the mean and variance of W in this permutation distribution, 

say E P {W) and V P (W), we shall be able to obtain the unconditional mean E{W) and 

variance V(W) from them if we know the parent distribution from which z was sampled. 

37.25 Since z'Mz is a scalar, 
f z'Mz. = tr (z'Mz) = tr (Mzz') 

where we commute under the trace operator. Thus (37.39) gives 

z'zE P (W) = E P { tr (Mzz')} = tr {M£ P (zz')}. 


Now 


= - NV h 


(37.41) 


omvv 

EM) = **/ N > = -z'z/WlV-1)} 

for j ^ l. Substitution of (37.41) gives 

Ep{W) = A’(.V-l) tr {M(NI_11 ' )) 

= jv^i tr (M) 

= p/(N-\), (37.42) 

since Mil' = 0 by (37.35) and tr (M) = p from 19.9. 

(37.42) shows that E P {W ) does not depend upon X or upon z. Thus, whatever 
the distribution of the errors, say /, the unconditional mean of W will also be 

E(W) = E{E P (W)} = p/{N- 1). (37.43) 
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• TIST1 cS 

„ THEORY OF STA is completely 

'> ... J aD VANCED t» s0 that the ® ean 

Tn the normal c^. s ° I 

ticute, «iU hol ^ maUty . tbe e , ements of M , 

SS .0 <^ rtures £ ,, ( r). Writing w < 37 - 44 ) 

+ l p y :' z ) s U rs Mm- w e , 0 by (37.35), | 

+ £p ^ /the A as at (12-8) ( s0 that ^ tr i c functions 

write 5 for the rth power-sum o ’ he allg mented sy t h e 

We now write V u . relations between (12.Vj- 

while “f the expectations m (37-«). 

and power-sums to 1 Tab le 10, we find 

weight 4 section of Appen = J(> 

N®E p (4z s ) = ~ Si 

N®Ep(Zr z s) 


(37.46) 

/ y r \ / «/ 

n 9 

N {3) E P {Zr z s Z t) = 

NWEp{z r z s z t z u) ~ 3^| — 6j 4 .J 

Furthermore, we may express all the sums involving the M„ m (3 . m erms o 
. - i ill using the idempotency of M, the value p of its trace, and the 

Ml = 0 by (37.35). These relations are. 

v i\/r m — 


S M rr M rs = -m, 


(37.47) 


SM‘ s = p-m, 

S M rr M ss = p 2 -m, 

ZM rs M st = 2m-p, 

S M rr M s< = 2m-i> 2 , 

S M rs M tu = p 2 + 2p-6m. 

Substituting (37.46-7) into (37.45), and writing 

£ 2 = h/(N- 1), = (7V(iV+ IK—3(A7— l)j|}/(AT- l)(a) f 

'm th t/ S by (I2 ' 28) ’ we find ' on dividin 8 b "y ( 2 ' 2 ) 3 = 4 subtracting 

{£p(H/)j 2 and simplifying, 0 

v P {W) = ~ p Mztzl)_ + ^ i f ^> 2 2*(jv r -/>-nv 

(iV+l)(iV-l)2 ^|(JV-1)*{ jv “A^v+1)~}‘ (37.48) ^ 

*. and upon z through the 

T r,rrrS K ' “ ^ ^ We See that 

F (^) = e ( W2 )-{E p {W)} 2 , 
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_ft" * V P (W) = EAW°)-{EAW)}\ 

1m . (he distribution of the errors, 

.It if / 18 th V(W) = E 
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so 


V(W) = E{V P (WO}- 


(37.49) 


j 

t xrariance of W thus depends essentially on E{kjk |) in (37.48). In 
^ ^condit^ 1 *>" a 

^ >»a‘ case ' 


the 


— • E(kjk 1) = E(k i )/E(kl) — 0, 

• 'i-mted independently of kjk\. (k z is a complete sufficient statistic for, 

: n ce ** ^.^Itributed free of, the scale parameter, and Exercise 23.7 applies.) Thus 

S J is dis \ ricrVit of (37.48) is the normal-theory unconditional variance, and 
a first term on the rigm v 

t ' ewrite the “ r 1+c ( JV+1 > _ 

y p (W) = { V{W )}Normal [_ A + W 2 £(JV-j> - 1) 


iv-NW+ir;J 


(37.50) 


where 


= k, 


(J ~~~ ^4/ rw 2 * 

ere y T 7 rrkp 37 10 asks the reader to show that m can be expressed in terms of 
ratios (iA and {**/(*„*«)}« of the ? regressors *. Using that 

^sult V^- 50 ) may be written in thC f ° rm 

Vp{W) = {E(W0bormal 


1 + 


{ N- 3) ' 

2N(N—i) Ly x . 


(37.51) 


where 


N(N 2 -l) 


-xt ^ wi— 


2 p(N—p — iyi 


/>(N-^-l)(iV-3)\ iV 


) 


V fh \ p 

_ 2 ( — i + XS 

p{N —p~ 1) \i=l \k\/i i±3= lV^ 


(iV-2) 


iV(iV+l) 
K 


<22 


20 ^02/ ij 



(37.52) 


and 


/T/ , wvt _ _2j>(iV—j >—1) 

((/(kK ))Normal (]V-|-1) (iV-l) 2 ’ 


(37.53) 


(17 521 shows that C Y is a multivariate generalization of the univariate kurtosis ratio 
[jklto which it reduces when f = 1. It has zero mean in the normal case, by the 

argument given for kjk\ in 37.27. 

37 29 The permutation distribution’s moments (37.42) and (37.51) permit us to 
fit a continuous distribution to the discrete permutation distribution of W in the manner 
of 31.47, by choosing a Beta distribution with the same mean and v a™n“ Since 
both W and the fitted distribution are on the range (0, 1), and we know (cf. 37 that 
this distribution holds exactly for the unconditional distribution of W in n 0 ™^P^ s ' 
there is a reasonable hope of obtaining a good approximation to the general permutation 

distribution. . , , /r„_ m 

The mean and variance of a Beta distribution with parameters ^ s , Vi ( from 

Example 2.8) 


, _ Vi 
jUl v x + Vz 


2v x v 2 


^ 2 (v 1 + r 2 + 2)(vi + ^2) 2 


the advanced 


theory of STATISTICS 


(i w } (37 - 5 *) 

n = Respectively. 

. „ tP a to Ep{W) and ,, J F (j^) is constant at/>/(jV-;n 

stfj ”'P* £&£ by & ^ as "• Substituting *' ( *> 

£ " £ * ( 3W4) ’ r We (^IKl- (37,5, 




* thp “ correction factor ” in (37.51), i- e - 

where c is the correcti N-3__ c Cx . (37.56) 

C ' 2N(N-1) v 

It follows that , (N -\-1 )c 1 _1 H7 ?j\ 

v 2 = {N-p-l)\l+JfI\Z2c) ' ^ 

, rr Vfc r> nr f = 0 c = 0 and normal theory holds for V P (W), to ou r 
approximation^ whateve/the underlying distribution of the errors. 

1730 f 37 49 ) and (37.51) show that the unconditional variance V(W) is simply 
V P (W) with C y C x replaced by E(C y C x ). With this modification, the approximation 

of 37.29 holds for the unconditional distribution of W as well as for its permutation 
distribution. 

1 

f 

37.31 Since the approximating d.fr. defined at (37.55-7) depend essentially on / 
the correction factor c y it is of interest to find bounds for its constituents. Exercise 37.11 
is to show that 

p 2 /N^m^p(N-l)/N, (37.58) 

and hence, from (37.52), 

-2<C x (JV-3)/(AT-l)<AT-l. (37.59) 

We see that, if these bounds for C x are attained or nearly so, the correction factor c at 
(37.56) will be of order N -* near the lower bound and of order N° near the upper 
bound. This will determine, at least in large samples, whether the correction factor 
is negligible or not, i.e. whether the distribution of W is robust or not. Since we have 
seen at (37.52) that the deviation of C x from zero is a measure of multivariate normality 

—; i„T ay $ay that n0rmali,y 0f the regreSSOrS P rod - -bustnesr; 

Robustness to non-normality in one-way AV 

particular Zl of! ^ HnCar m ° del ^ 37 * 22 " 31 t0 ^ 

in (37.36), we suppose that there are (D+U a™* thereare (P +parameters (M) 
metrized form of the model in the later nart nf F UPS T ^ classi ^ cati °n—the re-para- 
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A ^ s P+1 

there are % observations in the/th group, h^rij = N. Each of the p 


• ply indicate whether an observation does or does not fall into 

pjUSt ^ . i_:_ r^C 4-+0 ( -h _ 1 _ 1 crrnun ic imnlipri bv nnn-membership 


c c 0 rs ^ ust ^membership of the {p + l)th group is implied by non-membership 
reg^- cU lar g roUp Ordinarily, we should define the regressors as 0-1 variables for this 


nP^theothers^ ^^gatisfy (37.36) and therefore instead define 
|Uf pose, bu f 1 w hen the observed y falls into the jth group, 

11 N 


P 




when it does not, 

N 


(37.60) 


N 


> N and j = 1, 2,..., p. Then 2 = 0 as required for each j. 

f0f * ^ !’ ’ we know from Example 35.1 that the SS attributable to the fitted model 
In this case i nst ead of dot suffixes for averages, 

(s,.y hi ,,y 

,'m. - /m, •j^ 


is, 


i=i 


and 


from (37.61) we see by considering the coefficient of 


(37.61) 


« that M„ = ~ ^ or 


every 


member of the jth group. Thus 


N o P + 1 /I 1\ 2 P+ 1 1 

m = S - S rij y— .2 


2p + l 


r= 1 


i=l 


2=1 


AT 


and (37.52) becomes 


iV- 
iV 


-3 r _ 

-^T x ^(A/-£-l)\j=i »,• N J 


(37.62) 


.. . ran b e substituted into (37.51) to get V P (W), first given by Welch (1937). 
Evidently^ the value of C x will depend critically on how the N observations are allocated 

to the (i> + l) g r0U P Sl 


37 33 If rij = N/(p + 1), so that all groups have the same frequency, (37.62) 
becomes 


AT-3 


N -1 


Ct = -2, 


(37.63) 


attaining the lower bound in (37.59). (37.56) is therefore 

C = -Cy/N, 

and the multiplier to be applied to p and (iV-p-1) in (37.55) and (37.57) is 


1 + 


2 G 


Af(AT 




rgssa^ss. .h. -»«K— mSXffZ 

correction can actually be reduced to zero (cf. Exercise . ) Y , t y s s bght 
quencies unequal in a certain way, but in general it seems unwise p 




fG rflCS 

F S^ S osing .ha robust, J ■ 

;0 KY .oense <* 


10* 

furil icr 


l rA / r • n c 


L ted i n 37.33J 




*-5*“'“ 


to h ctcr ° SCC n 

i Ac 

37#34 i - case #1 


consJ 


sider 


the 




(37.64) 


,11. .(37.56) is 


cX trein c v “ _ ^ 

' /j\r is 

• in (37' 59) 1 - ' 1 

2 \ i/ 0 ' in 

lately 1 + « C "’ 

emerg es 

..itiolier 1,1 ’ :*» «uw**- 

and *? "iterest about this 


1 


which 


is very 


-•<£hb 


2ft’ 

and 


c i^ 

is appro^ 



• iW 55) ancI ff.f 

; m ul.ip<-r .n ( 37- ^ __ Hs opposit e extreme 


“„-robusm® s of t interest abo 

The p oint nn Ai'j which is -v) z - 

,J a «« the SS (W^ . .y f+ (N'P)^ y> 


ca! 


y'My - £ 


5 jV increases, j^+i 


and y 


' j> . 
S^xy + 


(N-p)yr +i 


dicating extreme 
on ly when we 

(37.65) 
/ jV will differ 


simply 


f=i 


ne; 


eligibly. 


Thus, to 


V 

where y. = ,t. ratio of this to 

The F-& in 37.22 wil1 be baSed 


”1 P+ 1 . 

s 2 for?) 

i=l j=1 

Thus 


. 37.22 will De unot jy-p _ 

..-yr- 1 (yv-sr W-^ a ~ J, 

i; // i 1=1 

j = l 


» 

/ 


(37.66) 


J, ?( fe+ 1 -ji, + ,)Y(V-f’-l) 

i=l 

Apart from the term which compares y_ and y p+1 in the numerator, and the corre¬ 
sponding extra degree of freedom there, (37.66) is the E-statistic for testing the equality 
of variances in two normal populations, from samples of size p and N—p (cf. Exercise 
23.14). In the light of the results of this section, it is easy to understand the extreme 
non-robustness of the latter test, referred to in general terms in 37.21 and in more 
detail in 31.6-8, where essentially the same correcting multiolier (U’n ,„, D • +•£• , 

directly for tests on variances generally. P er (1 + 2 C y ) was justified 

R0b ; s, ; es y° “ W—d Classifications 

' W “ ed Purification of 37.33, (37.63) and (37.52) show ' 


K 


N 


m = = PVN. 


(37.67) 
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- umptions of the analysis of variance 

*« Will be attained, and a negligible correction for normality 

whenever (37 £ - and “ p "37.68) 




the diagonal elements of M = X(X'X)->X'are all equal. 

* e \vhenev er & . sa id to be quadratically balanced. It is 

for * 11 '’linear m° del sa fi ' -_ — - 1 — tfc«t m.fitt bolds for any cross- 


I 


for ^^siderations^ of symmetry alone that (37.68) holds for any cross- 

# en ^ Pe from conS1 fl .- mien cies in every cell (previously called balanced ) 
ea^ t0 Lfi with equal q ^ &1 frequenc i es at each stage of the hierarchy. 

cl#s f'Irchical classify method, derived this as an asymptotic result for the 

^^distribution of W; his results also apply to other F-tests than the overall 
is the only one considered here. 


test 


u . r p in the one-way classification, with equal frequencies, Atiqullah. 

In analysis 0 f c0varl ^ C d with th J conclusion of 37.31, the extent of non-normality 
(1964) showed that ^ determines the robustness of the F-test to non-normality 
of the conC ° m ’ ob ustness to other departures from assumptions is also considered. 

0 f the errors, r 


Dis mbuti<®f ee “f : ° b of the standard AV methods leads us naturally to 

rttSy robust, methods of analysis can be found. In other 

i e nq u ' re whe ( ?„ r( . di str i bu tion-free methods for AV problems? . 

words, are th ® r in 31.70-4, that so far as the one-way classification is con- 

f W e h f e * r y is in the affirmative: distribution-free tests exist for the equality of 
cerned, the a " of k samples from any continuous populations otherwise of 

, ^ location param ^ ^ ^ ranks themseWeSj using the test statistic 

/ S' IrTn the normal scores E(S, n) (cf. 31.71, Vol. 2). These tests are completely 
( 3 i' 5 ,°L anv continuous distribution and have very high asymptotic relative efficiencies 
! ' q normal location-shift alternatives (cf. 31.71). Another test, against ordered 

| I"«as given in 31.72-4 (cf. 35.66 and Exercise 35.15). 

q? 37 Further, the permutation distribution of W in the general linear model, 
HncuBsed in 37.24^31, is distribution-free in the same way as permutation tests were 
in Chapter 31* the test of 9 = 0 holds as an approximate test for the symmetry of z 
in its arguments whatever the underlying distribution of the errors. However, this 
test of 0 = 0 does not carry us very far into AY except for the one-way classification, 
as we saw in 37.32-5. We now have to consider how far distribution-free methods 
may be of use in more complex AV situations, where we wish to test mam effects, 

interactions, etc. 

Two-way cross-classification: permutation test 
) 37.38 Consider the simplest two-way cross-classification, with one observation 

per cell. Suppose that there are r rows and c columns, so that there are n = rc observa¬ 
tions in all, and that we wish to test column- (or row-) effects. In the spirit of our 
discussions of 31.21 and 31.39, the most natural procedure would be to replace the 
n observations y {j by their ranks, or by some other set of conventional numbers such 




statistics 

sT It is difficult to 


„ TBEOBT ur ~ „ these, it is uuucuit to 

» -i 

as the norn w ith the w hich to 

. f „ r column-effect S fr<>” which, 


equipro' 


oermutation^ 31 *'" 1 

may develop^ elTie rge. « y [ a invariant f 

.tioa-free testS _ , nntr e„ = fS W ^ 


test for column' 


1 


.effects, say 


5c 


= 1 


37.3’ However^” ^ tests 

a 8 «e *»" for test ing . r „w. 

^^. AV !l s »ttoeach°bse^ n ” any - 


If we take the mean 


• n f a constant to eatn — 

under the addition °1 he I0W s SS, say 

-r;r, 


C 2 ()’i.' 
i=l 


y y, equal to zero 


since 


(37.69) 


Now 


S c «rS/,. 

’ 3 ^g/( C 1 1- 7 --Tyj-) 

F ~ IS-S R -Sc)/{( r ~^( c . , , • 

' r o S—Sn is invariant under arbitrary 

withd.fr.r, = e-1 andr, = *£>? We may therefore without loss 

addition of constants to the rows, and tlwrefore ^ ^ ^ ^ ^ . 

of generality put S R = 0 and Jf s~S c 

Sr 


w ■ 


jc 

S 


J 


C 1 t \2 ( 

.ife*) .1 

■ vc-l O - \ 


* j 


where 


1 + 


2 U 


(<r-l)2 W, 

i=l 




(37.70) 


u = 2 2 2 Wtf 

* = 1 l~lj =1 


i^l 

and (/ip); is thepth ^-statistic of the ^-values in the z’th row, y^y .- 2 ,. y. 


37.40 We can now find the moments of H7 70') me* „ ,•, , 

under a hypothesis of symmetry. Here, the ^*1“^ ' ° f (37 ' 40 >' 

effects in the class fication Because nf tL • yP , that there af e no column- 
additions to the rows, * nee" *~ ° hence IV, to con ™ 

Pitman (193«\ ..x test 03 bivariate svirimptm a - ■ y P uinesis - The case 
[ Whose W" should be consu'w f ^t SCUSSed in 31.78. 

ted f0r de,a ' ls . found the first four 
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y, and thence of W. He found 



BrW = 


1 


(37.71) 


f37 42-3), the mean of the permutation distribution is the same 
7 l), just as at V ' i therefore coincides with the normal-theory result. The 
( 37 ::.iheobservat.ons a Exercise 37.13} 


\ 


A* 1 W tbe ° b ?Ss2) is more complicated, being (cf. Exercise 37.13) 
***** as at (37-^h r n 


vat 


iaiice, 


r 2( c _l) 


1- 




PWi} 1 


(37.72) 


(37.73) 


plicated expressions for the higher moments. 

with w oie C °^J theory variance is, as in 37.29, 

the normal tn ^ 2VlVa _ 2(r-l) 

{V (IT)} Normal ( Vl + v 2 +2) (v x + v 2 ) 2 r 2 {r(c-l) + 2} 

(37 49), this is the expectation of (37.72) under normality. Solving (37.72-3), 

2 


and, aS at , . 
we find that 


2 H 


PW,) : 


c + l 


1 + 


1 


r(c — 1) + 2 


(37.74) 


c — 1 


‘ n Tffof the .F-test can be adjusted as in 37.29—Exercise 37.14 gives an instance. 

. 1.1 ^ _ 1 _t-nonn onrl VOHOnPP QrP TYt tfl ACTrP.P. with those 


Thp n If. oi uic -*■ -- j . . i . 

1 (19381 showed that when the mean and variance are made to agree with those 

; ^XVeta distribution in this way, the third and fourth moments also generally show 

i good agreement. 


37 41 The most interesting special cases of W for our present purposes are those 
i„ which the observations y {j are replaced by conventional numbers, so that the test 
is made completely distribution-free. Instead of the procedure outlined in 37.38, 
we shall now replace the y u in each row separately by a set of conventional numbers, 
e e their ranks or the corresponding normal scores. If we use the same set in each 
row, an immediate consequence is that the {k v \ are identical for all values of t. Thus 

? ^ = i (37.75) 

{S ( K)iY (rktf r 

simply. If (37.75) is compared with (37.74), it is seen that they differ negligibly if 
either r or c is not too small. The distribution-free test statistic using the same set o 
conventional numbers to replace the observations in each row will then ave approxim 
(1 ately the same distribution as the normal theory test. In particular, this is true it the 
ranks or the normal scores are used in place of the observations. 

It should be noted that when conventional numbers are used in the test, there is 
no need to put y t = 0, for S n will be = 0 in any case. Of course y.. (a constan) 
must be restored to S c and S in this case, as in Exercise 37.14. 
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108 i ^tions r P A V tests o f C ° 1U " a effects, in a two-wa y 

Mote complex cIaSSlfi d istributio n ' fre ^ 0 f roW - ( c °l um /• ity 0 f this situation 
*%# Thus .he^f of the exis^e of e)1 Tfae sunp^, fflean) ^ ^ 

of row-) effects '" Ji, one observutw P (neglecting | ren dered identicals 
^ifrTwrtin/f. one (S£ rando m variable 

arises “ ‘ 


r 


37.« Thus.hene^u,u Ae en. d . ^ staP^, mean) has ^ 

of r °'7 ) Son'wfth one Jafthe total SS (neglecong h | e rende red .dent,call, 
cross-classttooo ^ ^ in 37 . 39 , the ^ ^ one (« ® 7 (w0 rand om variable* 
arises because, ^ S-Sn~S<» then a ratio of the Y h&comes a constant, 

SS « 3 s- =s-“ *-—-77 


in the problem. wncu w[s ; uS t a constant r 

that die Beta transform W permU tation test, this simplicity 

variable, S c - . t con sider generalizing P ification with more than o ne 

As soon as we begin ^ tw0 -way cross-class componen t; for a three- 

disappears. Eve" S S has a further (Injerac ^ compon ent appears, 

observation per | observation per cell. « , _ statist ic for column- 

** be difficult > 

%£££»£ t-- “ - **** 


.. .• .„ m1 M be to consider the analogue 

37.43 An alternative method of gener tza^^ ^ . situations. For example, 

of one of the distribution-free statis ics 1 mu i t i p l e of T, essentially the variance 

Wan ed mo-way cross-classification, and possibly also for the unequal-frequencres | 
cte; and for the thrh-way (rxcxl) cross-classification under the (d) rf eqmprobable / 
permutations of the column ranks within each row and each layer of the classification. 

So far as we know, neither of these generalizations has been carried out. 

Finally, it does not seem possible to obtain a distribution-free test for interactions 
by this method. 


Median tests 

37.44 A different approach to the construction of distribution-free AV tests was 
followed by Brown and Mood (1951). The principle of test construction which they 
" (a est,mate f Peters unspecified by the hypothesis by median statistics 

and then (b) to test whether the residuals from this median a 1 i ’ 

Of their signs negative and half positive m ed,an-estimated model have half 

For example, in the one-way clarification, with n s observations in the yth group 

i-" , = ° n,y the general mean 3*«left unspecified by the hypothesis f , ^ 

means. We therefore estimat u i ^ Ypothesis of equal group 
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Group: 

NO. °( Zl 
obse rvat.ons\<y 

rpoTA L 


1 

2 . 

k 

Total 

VI1 

m 2 . 

m, c 

In 

ni-nix 

« 2 — m 2 . 

3 

1 

3 

5? 

b 

ni 

”2 . 


n 


(37.76) 


. noW is that all £ groups have the same median, i.e. E(rn 3 ) = for 
The hyP otheS1 ^ k e seen at once that this is the binomial homogeneity test treated in 
each;- istic (33.122), which in the present notation is 

33.55- The ^ * («,-»,)» 

i=i 


4 W J 


(37.77) 


. tirallv distributed in the form with (ft -1) d.fr. The test (which may be 
is as 7”“ p “ ac tly by the method of 33.19, Case 1) is distribution-free. As for the Sign 
earned ou ' of t j ie me dian reduces the problem to a binomial one. 
test in 11 


_ More complex AV situations may be treated by the same general method. 

/ two-way cross-classification, as Exercises 37.18-20 indicate, a variety of tests 
F ° r -t-hle With one observation per cell, or in the more general situation when 
ILTare no interactions, column- (or row-) effects may be tested; when interactions are 
t tn resentj C olumn-effects may be tested against interactions, or column- and mteraction- 

| effects may be tested jointly. 

7 

37 46 These median tests are attractive because of their computational simplicity 
and the fact that their theory is immediately available, at least in large samples, from 
that of (2 xc) contingency tables. However, not every problem is soluble by their 
use eg. there is no test known for column-effects against residual m the general 
balanced two-way cross-classification. Further, even when a test is available, it is not 
always distribution-free—Brown and Mood (1951) show that a median test for inter¬ 
actions in the balanced two-way classification is not. Finally, the efficiency of these 
median tests is not generally as high as that of tests using ranks or normal scores when 
the errors are near-normal. In 32.6-7, the Sign test was found to ave o fn m 
the normal case, against 3 [n for the ranks test; and in Exercise 31.1 , a test ° 

randomness was seen to have ARE of 0*78 against the 0-98 foun in . or tes s 
using ranks. Andrews (1954) showed that the one-way classification test discussed in 
37.44 has the same ARE, 2 fa as the Sign test while, as we saw in 31.71, the comparable 
test statistic (31.150) based on ranks has ARE 3/9r. 

Bhapkar (1963) gave some efficiency results for the two-way classification median test. 


37.47 The restricted scope and relatively low efficiency of the median tests obtained 
by using the principle given in 37.44 are rather disiippointing— intuitive y,i see 
tha * it ought to be possible to find general AV procedures with the high efficiencies 
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which we saw in Chapter 31 to be characteristic of tests based on ranks. The near^ | 
approaches to such procedures have been developed m a senes of ^ 

( 1963 a b c 1964) and by Hodges and Lehmann (1963) (cf. also Hoyland (1965) ^ r 
Bickel (1965)). These procedures are only asymptotically distribution-free and it is f 
interesting that they, too, are based on median estimation methods of a different s 0rt j 

from those in 37.44. 

37.48 Suppose that f 

“ip = i“i + «in (< = 1. 2,..., c; p = 1, 2,..., «r) 

is the model for a set of n = 2 n t observations. The e ip are independent, but other. 

i=zl 

wise we assume only that they have the same distribution. We write 

hj = K-f*t 

for the parameters in terms of which all quantities of interest may be expressed. We 
shall discuss median estimators 6 t j of Oy. 

Let be the median of the n t n^ differences (x ip — x^), where p = 1, 2, . .., n { ; 
q = 1, 2,..., Tij. The are clearly estimators of the 6^, but they do not possess the 
desirable transitivity property that 

hi + 0j k +hi = 0 for a11 *> h k - (37.78) 

Adjusted estimators which satisfy (37.78) are 

hi = yi-yj. . (37.79) \ 

1 c _ . -4 

where y { = - S y u . Lehmann (1963a) gives a numerical illustration showing that / 
c i =1 

the hj agree well with the usual LS estimators hj- 

As n —> oo suitably, the hj tend to multivariate normality; they also have the same 
estimation efficiency, compared with the standard AV estimators based on means as 
the Wilcoxon test has compared to “ Student’s ” /-test. If / is the common frequency 
function of the e ip , it follows from (31.115) that the efficiency is 


12a 2 


r oo 

{/(*» 

_ J —oo 


2 dx 


= k 2 


(37.80) 

say. By 31.60-1, k -may be infinite, but can never be less than 0-864 for any continuous 
/; in the normal case, k* = 1 = 0-95. Thus, generally, has the same 


estimator of fL 


limiting distribution as «“(%-%), where hj is the standard (LS) AV 

n Z1 f A . The as y m ptotic property stated in the last sentence of 37.48 imolies that, 

the usual AV^r^ We ma Y, set up analogues of all ^ 

gives two consistent estimators of J and n%31A Lehmann ( 1963c ) 

intervals for any contrast nr nf ( 1963b ) develops large-sample confidence 

author (1964) (see also HoH^q a t C ° ntrasts ln the parameters. Further, the same 
l V (see also Hodges and Lehmann (1962)) extends his results to the situation 
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assumptions of THE ANALYSIS OF variance 

+We are “ nuisance factors ” in the nh^r,, 111 

Sapter 38, “ blocks ” within the experiment) with equal nu' iT ^ terminol °gy of 
C fceU of the same “ block.” ' 6C l Ual nu ^bers of observations in 

64 ghuchongkul and Puri (1965) extend the asymptotic theorv tn i 

0 f contrasts including those based on normal scores. y a c ass of estimators 

Missing observations 

37.50 The advantages of balanced arrangements in Model T av , 
gonality, ease of computation and superior robustness, are such thlt moTl 
analyses wiU seek to take advantage of them Nevertheless, force of circum tan^m 
sometimes lead to involuntary departures from the intended equality of frequence 
plants or animals may die, human subjects may prove reluctant to co-operate, or rZds 
may be lost before analysis. If this happens, we are always free to analyse the achieved 
unequal frequencies by the appropriate non-orthogonal methods, but, as we have seen 
these are often complicated. Moreover, accidental losses of observations are rarely 
extreme; usually only one or a few are found to be missing. It is therefore worth 
investigating whether we can retain the original AV structure and correct it for the 
missing observations, rather than abandon it altogether. 

37.51 Suppose, then, that m of the n intended observations are missing. Without 
loss of generality, we take these to be the last m components of the observation vector y, 

which now become unknowns, say u v ... ,u r 


Thus we may write y = where 


z ((n-m)x 1) contains the actually observed values of y and u (m x 1) the unknown 
observations. In effect, we are presented with a fresh set of unknowns to estimate, in 
addition to the original parameters of the model. It is natural, in these circumstances, 
to estimate the values in u by the same LS method as we use for the original parameters. 

37.52 The sum of squared residuals 

5 = (y-xe)'(y-xe) 

must therefore now be minimized not only for variation in 0 (as was done in 19.4) but 
also for variation in u. If we first minimize S with respect to 0, we shall, of course, 
obtain the original LS solution (i.e. the LS solution if there had been no missing 
observation), but the estimator and the Residual SS of that solution will now both be 
functions of u, say §(u) and S 0 ( u). The minimization process could now be com¬ 
pleted by minimizing S 0 ( u) for variation in u. , , v ._ e 

However, this two-stage minimization procedure, which was s ^gg e s e J 
(i933), is not the easiest way in general. Instead, let us minimize S first for vanat 

in u. Partitioning X into f^ 2 ) conformably with the partition y = we have 

S = (z-X 2 0)' (z- X s 6) + (u-X, t 0)' (u-X 6). depends 

Since only the second of the two non-negative terms on the rig t o 
u P°n u, we reduce it to zero by putting (37.82) 

u = X^; 


- ■——--—- 


rrWFORY oF STATISTICS 

f I* THE ADVAN fi C ® , which may then be minimized with resp ect 

thus S at (37.81) is reduced to its met hods just described must be 

* But the resuits of the two ^ *- 

She same. Thus if - obtain = (37,3) | 

„ d „« »i. «•'.«»" - w»> 

. • * urinated to its estimated expectation 

(37 84) states that each missing observation ,s to be equated 

in the original LS analysis. 

1 d for u and the solution u is 
37.53 (37.84) is a set of linear equations to be^^forward’solution was given by 

then to be used in the original Lb analysis. ^5 

Tocher (1952). , , p yector 0 in the original LS analysis. 

First, suppose that we replace u by the null 

(37.83) becomes 


Now consider 


0(0) = (X 7 X) 1 X 2 z. 

a = (i-xjx'x)-^:}-^,,^). 


(37.85) 

(37.86) 


Observing from (37.83) that 

0(u) = @( 0 ) + (X / X) -1 X l ,u, 

we find that (37.86) reduces to 

{I-X^X'X^X^u = X^ty-X^X'X) 1 X M u, 




so 


that 


u = X„S(u). 

Thus u defined by (37.86) is the solution of (37.84). 


(37.87) 


/ 


37.54 We have seen, therefore, that in order to estimate the m missing observa¬ 
tions u, so that we may preserve the computational form of the original LS analysis, 
we need only 

: 

(a) perform the original analysis with u = 0 to obtain 0(0) at (37.85); 

(b) calculate u at (37.86); and 

(c) again perform the original analysis using u in y. 

It should be noted that the matrix in braces to be inverted in (37.86) is (mxm). | 

Thus, if only one observation is missing, the matrix is a scalar and stage (b) above 
is very simple. 6 v ' 

detailed sototionf ofn^^fformanv^^ 1118011 av ° n missing observations gives | 

t • 4) for many common AV situations. See also Biggers (1959). } 

* 

original LS anal^is’ls*exact! CStlmat °. r obtaine d by using (37.86) in the 
y exactly the estimator which would have been obtained by using ) 
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11 6 

:ved values alone (generally in a non-orthogonal analysis). For from 


X'xe(u) = x.; z+ x;.u 

= X(z+X^X u S(u) 



Thus 


X^ z — (X'X—X' u X (< )0(u) 

= x * x ^(») (37 88) 

, n7 88) is precisely the set of equations satisfied by 8 when z alone is analysed 
This result, together with the disappearance of the second term on the right cf 
nn 81) implies at once that the Residual SS obtained by using u in the original LS 
(3 ,ivsis is identical with that obtained when z alone is analysed. However, the degrees 
3 f freedom for the Residual SS must obviously be reduced, since we now have only 
I observations. If X and X 2 both have the same rank (e.g. when both have full 
^nk) the Residual SS will have its d.fr. reduced by m, the number of missing observa- 
* n . More generally, the reduction will be (cf. Exercise 19.8) m minus the difference 
rank between X and X s . 


in 


37.56 Although the Residual SS requires no adjustment, all the other SS in the 
0 table will be incorrect if u at (37.86) is used in the original LS analysis. This is 
most easily seen from the fact (cf. Example 35.4, 35.38 and Example 35.6) that each 
other SS in the AV table may be obtained as the difference between the Residual SS 
in two linear models, one of which is a restricted form of the other. Evidently, any 
\ 0 f these Residual SS is correctly obtainable, by the argument of 37.50-5, by using u 
at (37.86) for that model, but of course u will in general differ from that in the full 
/ model considered so far. Thus each of these Residual SS will be too large if u for 
the full model is used, since it will not be the correct (minimizing) u. Hence the 
other SS in the AV table need correction by the difference between the subtractive 
corrections to the corresponding Residual SS (or by a single subtractive correction if 
one of the latter is the Residual SS for the full model). Correspondingly, degrees of 
freedom must be corrected by the difference between two adjustments of the form 
discussed in 37.55, but this difference will often be zero. 

In the third of his four papers, Wilkinson (1957-60) gives an explicit method of 
obtaining the subtractive corrections to the other Residual SS, and hence the other SS 
in the AV table. Fortunately, as Yates (1933) pointed out, the latter corrections, being 
generally differences of quantities of the same sign, are often small, and the unadjusted 
SS may be used as approximations. 

37.57 Tocher (1952) gives similar methods of analysis for other types of spoilt 
experiments, namely those in which some observations are irretrievably mixed up and 
those in which some observations are unwittingly duplicated Plackett (1950) also 
y discusses the latter situation. 


EXERCISES 


distriW T here are C groups of observations, and all observations within a gioup are normally 
‘“"luted with common mean and common variance a], the model (37.1) hold.ng except for 
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the homoscedasticity condition below it. Consider the sets of c 

Cl : all the o* are equal (G-l constraints); 

C 2 : r of the k parameters in 0 are zero. 

Working in terms of the variable z x — y*J~ n > so that &7‘3) reduces to 

L x (z | M 2 ) = 

show that (37.8) gives , . .. . . 

L(z | A (2) ) = £(# | X)li(z)hW 


-In 


where is the LR test statistic defined at (24.40), and l 2 j 1 ’ wheie F ls the I 

variance-ratio test statistic defined generally at (24.99) and for this case in Example 24.8. (Th is j ; 
result generalizes Exercise 24.6.) (Box and Cox, 1964) 

37.2 Using Exercise 23.7, show that if in (37.8) l v is distributed free of certain parameters 
for which there is a complete sufficient (vector) statistic t, and lq is a function of t alone, Ip and 
l q are stochastically independent. Apply this result to establish the independence results m 
Exercises 24.6 and 24.13. Show in Exercise 37.1 that Z a (s) and l 2 (z) are independent when ; 
C 1 and C 2 both hold. 

(Cf. Hogg, 1961) 


37.3 In fitting orthogonal polynomials of degree k as in 28.16, the reduction in the total 

ft 

SS associated with the term of degree r is Qr — S as (28.72), Vol. 2. Show that 

j =i 

the ratios 

/ lc+l 

Zr = Qk-r +1 / 2 Qs, r = 1, 2, . . . , k, 

/ s=lc—r+2 

where Qk+i = (n — k)s 2 is the Residual SS, are all independently distributed when the regression 
coefficients cc r are all zero: 

(a) by using the result of Exercise 37.2; and 

(b) by using the result of Exercise 23.27. 

(This result indicates (cf. Hogg (1961)) that one may independently test the regression coefficients 
if one starts from the highest order and works downwards, “ pooling ” the associated SS of 
those adjudged zero with the Residual SS, until one is adjudged non-zero, when the process 
stops. All the tests are, of course, t 2 ( F) tests, and the overall test has size 1 — (1 — a) fc ~ k f 
if a test of sine a is used at each stage. T. W. Anderson (1962) shows under weak assumptions 
that this procedure maximizes the probability of correctly locating a non-zero coefficient.) 

37.4 In 37.10, show that^for the binomial distribution of 5.4, (37.14) gives the variance- 

where x/n is the observed proportion of 


j 

1 

/ 


stabilizing transformation ill 


= arc sin 


“successes.” 



(Anscombe (1948) shows that better variance 
stabilization is obtained if x/n is replaced by 
(tf+f)/(n + J). Freeman and Tukey (1950) 


+ arc sin 



37.5 In 37.10, show that for the 


suggest arc sin ^ [ — 

\\n + l 

(Cf. Example 37.1).) 
negative binomial distribution of 5.15, (37.14) gives the 
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. ^ 0 * transformation u(t) = ar sinh {(?)*}. where is ^ ^ 

portion of “ S ““ ' tt«) shows that better variance 

f Sfcr" i£ * /n is " * 

i 

j 37 6 in Exercise 37.4, show that the alternative transformation uQ = l og j- / A 

ta biliz es the varian ^^)? = Sh ° W that thlS transformati on is strictly appropriate when 

(37.9) is ^ (Cf. Bartlett, 1947a) 

a- 7 For a cross-classification with unequal cell frequencies, show that if the cell means 
analysed as single observations, their average variance may be estimated by s 2 /H, where 
T s the Residual MS of the origmal observations and H is the harmonic mean of the cell fre- 
S encies. Hence show how an approximate AV of the cell means may be carried out. 
qU (Cf. Scheffe, 1959) 

37.8 Applying the method of Exercise 37.7 to the numerical data of Exercise 35.7, show 
that the approximate AV for cell means is 



SS 

d.fr. 

MS 

Between sexes 

0-023 

1 

0-023 

Between breeds 

0-020 

7 

0-003 

Interactions 

0-006 

7 

0-001 


0-049 

15 


Residual 


517 

0-0017 ( = 


Compare the values of the F-ratios in this table with the exact values in Exercise 35.7. 

37.9 Show that if a linear model contains terms m # 0, we have approximately 

Xi *“* = Xi + (jh — l)#i log XU 

Hence show how m can be estimated. The process may be iterated. 

(Box and Tidwell, 1962) 

37 -lO In 37.28, show that M = XCK'X)"^' is invariant under any non-singular trans¬ 
lation W = XT. Hence, taking W' W to be diagonal, show that m = S Ml satisfies 

f 

ft,. \ \JN-\)p(p + 2) 

(Box and Watson, 1962) 

m = v" 1 * by considering the variance of the diagonal elements Mrr show 

r "r ''P /N. Using the invariance property of Exercise 37.10, suppose t a ’ 

and adjoin further columns> one of which is N ~n, to X to form an orthogonal matrix. 




i 


2 +ss 

l r?n / „• i ^ i = 1 
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from these derivations that the lower bound ^ 

(Box and Watson, 1962) 

, , in fT r oup-frequencies nj in the one-way classifies | 

37.12 In 37.32, show that if we choose (p + )g 


Hence show that m<p(N 1)/N. Show 
but not the upper, can be attained. 


tion so that 




1 p 2 + 4p+J 


N 


N +1 


1 


2 /vw+Tr n+i - . 

i =1 nj v . v»(W) for non-normality, whatever th e , 

Cx at (37.52) is zero and there is no correc ion N = (l-r)iV, show that C x = 0 j 

underlying distribution of the errors. Iff.-1, and Hr 

when , /N-2\' 

r = iU±' 


- - > 


3 N 


Ki± 3_l )> 


and that if N = 12, the optimal integer group-frequencies aie 9 and j3. ^ Watsori) 1962) 


37.13 Establish (37.71-2) by writing U defined below (37.70) in the form 
where 




[/ = SS Ru 


and showing that 
and hence 


Ru = 2 ytjyih 
i=i 

E P (Ru) = 0, Ep (K) = (c- 1) ( h)i (k 2 )i, 

E P (U) = 0, Ep (£/ 2 ) = (c- 1) 2 2 (A 2 )i (& 2 )u 

ij±l 


(Pitman, 1938) V 


37.14 In 37.41 show that when (37.75) holds, 

v P (W) = ~jr~Ty I 

r 3 (c— 1) 

and hence by the method of 37.29 that the d.fr. of the approximate jF-test should be adjusted 
to 

V 1 = (c- 1)--, 
r 

v 2 = (r-l)v v 

Show that when the ranks 1, 2, . . . , c in each row are used as conventional numbers, with 
R) as the sum of the ranks in thejth column and T = E ji?,- ^^1*2 V the statistic W reduces 
to 3 ^ ' i 


W = 


12T 


r 2 c(c 2 —1)’ 


and that as r —> oo, 

*2 W/( 1 - W) - 

has a x 2 distribution with ( c -l) d.fr. rc(c+l) 


I 


(Friedman (1937); Kendall and Babington 
Smith (1939); Friedman (1940) compares the 
F and x 2 approximations.) 
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37.1, expand 

sample * „ / t _«\i 


:«i se 


•ries 


4 show 


«.W-(•+«)• (i+^‘ 

£(«») = ( 9 + c )‘-i 0 ''+^^e-s+o(e-S), 


var Me 


n f„ , 3 — 8c , 32c 2 —52c + 17 >> 

= i \ 1+ “8T + -329*- +0 < a ' ! )}. 


s 

choice c = I removes the term of order 6 ~ l in the variance, reducing 

/ A \ 


Hence 


show that 


var M 3 /8 ~ifl +T _ . 


{E(Mc )i*-c~ 9 -l-y£ 


•( the inverse transformation is used on u c to obtain an estimator of 0, its downward 
so that it tn_ 


s ® . ne arly constant at ■$. 

is (The c = f result is due to A. H. L. Johnson; cf. Anscombe (1948).) 

1716 In Exercise 37.15, show that the coefficients of skewness and kurtosis of u c (t) ar 


Y i = “ 


y 2 = 


25 -48c 


945-1536c 
2560 


+ o(0-i), 


+ o(0- 2 ), 


\ compared with 

* Yi = 0 , 

i y. = 0- 1 , 

for the original Poisson variable t. Thus, whatever c is chosen, y x is approximately halved 
(with changed sign) and y% is unaffected to the first order. 

(Anscombe, 1948.) 

37.17 Using the result for var n c in Exercise 37.15, show that the transformation 

u's = (i + ^ + d) 1 + (* + £ — d)* 

has variance 

, „ 1 16<5 2 -1 , /A _„ 

var “* = 1_ 80 +_ 329 r ' 6 ’ 

so that if we choose <5 = \ to give u of Example 37.1, 

var “1 = 

37.18 For the two-way cross-classification with one observation per * CO unting 

. je there are no interactions, show that a median test for column-effects 18 "^^52 
v ^number of observations in the jth column which exceed the row median, and form 

( Xc) table like (37.76), with (37.77) as the large-sample test statistic. ^ ^ 

betelieda^ the gener . al balanced two-way cross-classification^ n ^ h ^ )1 ^ t g C the test of Exercise 

gainst interactions by finding the median in each ce 
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)VANCED tr ibuti orl ' fr 

... the ««t ^ ainS tRrown and Mood, 1951) 


«» . Sh c, W *e tt he«es t —- (Brown and Mood, 1951) 

17 18 to these inedw"* in r0 ws. - be tested jointly 

tows, hu, no. - an d the ith ro > 

ihat column-efl /)th cell . • eqU ivalent to the 

,7 20 In Exercise 37M J^e m/ observation »>“ show that ‘ t J, e columns and layers 

and Mood> 195,) 

.0 a iar 8 e satnple 


are 







CHAPTER 38 

the design of EXPERIMENTS 



\ For the greater part of this work, we have been concerned with the nmW 

\ *e analysis of observations, principally the problems of estimation 

) ** Thich appear in various theoretical contexts. In a very obvious way ever, 
testing *!“ of a method of analysis carries its own lesson for the future’ 

inVeS «he® we leam that 2 P art ‘ C “' ar “ eth ° d of est ™ation is more efficient than another 
implication m that we should use the better method in future. However, 
the Wlicatio® alone would not lead us to modify the method of making observation 
this« but merely to modify the analysts of the observations. In this chapter and 
1 ,1 following, we shall be discussing questions of design, by which we mean 

tnsderati° ns affecting the method of making, or collecting, the observations to be 

analyse* 1 - 

38 2 Design considerations are not entirely new to us. In Example 28.4 we 
ove red from the analysis of a simple linear regression problem that by choosing 
horigin of measurement and the values of the regressor in a certain way, we could 
1 6 re an orthogonal analysis and also minimize the sampling variances of our esti- 
1 6llS f- rs This is a design question, because it relates to how the observations are to 
I j^jflade. In the same Example, we remarked the hazard that this optimum solution 
j 6 oves the possibility of checking the assumption that the regression model was 
r ear t he value of the regressor. Unless we were very sure on this point, we should 
bablv “ hedge ” slightly by departing from the optimum choice of regressor values. 

^ Again in 37.21, the results of robustness studies implied that equal frequencies 
should be used in all cells of an experiment. Once more, this is a design question 
since it affects the method of making observations. 

38 3 In this chapter we shall discuss questions of design as they affect experi- 
mentation, largely using the linear models and AV techniques of Chapters 35-7. In 
Chapters 39-40, we shall turn to design problems in sample surveys. The distinction 
between these fields is fairly clear-cut, and may be expressed by saying that in surveys 
we make observations on a sample taken from a finite population of individuals, whereas 
in experiments we make observations which are in princip e generate y a yP° 
infinite population, in exactly the way that the tosses o a com are (c . • ’ 

and Example 38.1 below). Of course, we may sometimes experiment on ^ 

\ of a sample resulting from a survey, or even make a samp e survey o should 

) (extensive) experiment, but the essential distinction between the two fields should 

Cochran (1965) gives an interesting general discussion P roblemS 

which arise particularly in surveys rather than in contro e p 

i 119 
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Principles of experimentation , . 

38 4 Classical discussions of the principles of experimentation emphasized t he 
importance of varying the (supposedly) causal factors in an experiment in order to 
observe the effect upon the dependent variable being studied. In two respects, however, 
these discussions are now generally seen to have been inadequate. Firstly, they tended 
to be phrased in terms of the variation of a single causa1 factor-at a time, rather than 
of all the causal factors in combination. Thus, J. S. Mill s (1843) fifth canon of ex- 
perimental enquiry states: 

“Whatever phenomenon varies in any manner whenever another phenomenon 
varies in some particular manner is either a cause or an effect of that phenomenon, 
or is connected with it through some fact of causation.” 

In the light of the results of Chapter 35, we see now that a “ one-at-a-time ” 
approach can have no hope of evaluating the interactions between causal factors. 
Not only does this deprive us of essential knowledge of the linkages between causal 
factors: it may actually be positively misleading. For suppose that it is the purpose 
of an experiment to find which combination of ingredient A and ingredient B gives 
the highest resistance to breakage in a ceramic product. If we find the dose of ingredient 
A which gives highest resistance, and the dose of B which does so, it is by no means 
true that if we combine these values we have arrived at the optimum combination 
sought, as the reader may easily convince himself numerically. Interaction between 
the factors, which can produce effects like this, can only be studied by varying them 
simultaneously. 

38.5 The second inadequacy of the classical discussions is even more radical 
and is again illustrated by the quotation from J. S. Mill in 38.4. It arises from the danger 
of attributing to one or more of the experimental factors, effects upon the dependent 
variable which are in reality due to variations in some causal factors not included in 
the experiment. An unrecognized causal factor may (unknown to the experimenter) 
vary during the course of the experiment in such a way as to favour a particular com¬ 
bination of experimental factors; this combination will then appear to be highly effective 
when it is really the unrecognized factor which is producing the good results ’ 
The classical discussions had no solution to this problem, and it is essential to 
realize how deep-seated and ever-present the problem is. We can never be Quite 

sure that all the important, or even the most important, causal factors have been in- 
corporated in the structure of the exneriment i * , Deen in 

although known, may wrongly be considered to be n T y . be qUlte “known; others, 
neglected We always X. T" , ° f minor “Portance and deliberately 

p ™“ ° tfc 

Randomization 

Fisher^inThe °f 38 ' 5 ^pounded by R. A. 

seen throughout this work that his cnntT '? 7 h ' S b °° k (Fisher ( 1935 ))- We have 
and far-ranging. Nevertheless it is nrl'k, *° StatIstical theory were remarkable 

’ Pr0bab, y no operation to say that his advocacy 


i 
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in experiment design was the most important and . “ 

I , 0 ^' achievements in statistics. d »' most influential 

0 h is 

The prin c ipl e of randomization is simply stated* Wh 
38,7 Lmted to factor-combinations in an experiment , this should TfT [ xperiment ^ 

*»«£ S' Ramies. 

process «* ^ app ii e d to each eligible experimental unit. h Same 

c \i& ce ° t u r even if we randomize in accordance with this urinrinU . , 

„ ^of experimental units which we make may still work in favour of par'ti* 
^.^combinations. However, the difficulty of 38.5 no longer troubles us if we 
' Irate the process of randomize ion into the framework within which our inferences 
inC0 lde The hypothetical population within which we now infer includes every 

J, e £, e pattern of allocation of experimental units which the randomization could 
j*“ produced. Within this population, by the very nature of the randomization 
" cess the effects of factors outside the experiment can show no favour to the factors 
f oC . - t an d our inferences are free from bias. 

^Evcn if the relationship of the dependent variable with some unsuspected causal 
tor is not recognized until after the experiment, the validity of the inferences will 
ot be impaired, provided that that factor’s influence was “ randomized out ” of the 


not 

experiment, 


38 8 Thus, the problem of 38.5 is solved by changing the inferential base. Neces- 
‘lv this has the effect of changing the theoretical basis of our inference, and we shall 
develop this point shortly. First, we illustrate by a simple example. 




Example 38.1 

An experiment is to investigate the dependence of reaction-time in male automobile 
drivers upon the alcohol content of their blood. The drivers taking part are to consume 
measured doses of alcohol and, after a fixed time-lapse, to undergo a blood-alcohol 
test and certain standardized tests of reaction-times. The problem is how the drivers 

are to be allocated to the different alcohol-doses. . 

This is intrinsically a regression problem, with reaction-time as dependent vana 
(y) and blood-alcohol content as regressor (*), but it should be observed ttox is not 
strictly under control—we can only control the alcoho - ose (z), an we 
the value of x in each case. However, , and x will be in fairly ^ e rdauonslup, 

and it is reasonable to assume that to each fixed value of *, *_ gr0 ups 

grouped set of values of x. If the z-values are sufficiently the dassl 

will not overlap, and we can treat the problem as a one-way 

fication being indexed by the values of z. . , : se j n the absence 

In this example, it is not difficult to see the problems amp ,e, that 
°f a randomized allocation of alcohol-doses to rivers. q £ a i co i 10 l. Presumably, 
the drivers were allowed themselves to choose t eir ? Since normal drinking 
hard drinkers would choose larger doses than ot er ^ r reac tion-time tests might 
habits affect one’s tolerance of alcohol, the resu ts in var i 0 us alcohol-doses. 

« to mask the true differences between the effects of 
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122 „ • „ drivers to cnu true , it is important 

It may be argued that a ^ n f f ving practice. Even i undertaken . The essential 
truer picture of what t£) the scientific en^ forrner attempts to delineate 

to realize that it is u experiment is inves tigate relationships which 

•—*■— 

within the experiment from b drinking habits in Example 

38 9 The fact that an “ outside ” influence h en ^ us from analysing its 
38 1 8 has ten removed by randomization m no )' P ^ ^ case of Exam ple 38.1, 
effect after the (randomized) experiment is comp ' \ t a ll accurately the normal 

fwlt resumably be a difficult matter o .certain^ ^ rf 

drinking habits of the this has been “randomized out 

at which the tests were taken. Whether or . carrying 0 ut a regression 

of the experiment, there is nothing to prevent ^ found, say, that tests taken later 

analysis of reaction-time upon time of y. » ^ fce a matter f or mvestiga- 

procedure on future occasions - 

38.10 Our statement of the virtues of the 

that all randomized experiments leave noth mg_to ^ 
effect of time of day upon reaction-times, as in 38.9, and PP 

the experiment were carried out at 6 p.m., the end of the day work o ers 

aldng part. In effect, the factor “ time of day ” is then constant at one level, and the 
experiment is vulnerable to the possibility that this factor interacts with blood alcohol- 
content in its effect upon reaction-times. The randomization with respect to alcohol- 
doses does not help at all in this respect, and the criticism of classical procedure in 
38.5 applies here. In fact, the randomization has been incomplete, because time of 
day has been neglected as a possible causal factor. Randomization can only confer 
inferential benefits within the sphere to which it has been applied. 

38.11 It will be clear, then, that the factors influencing the dependent variable 
in any experiment are, explicitly or implicitly, divided by the experimenter into three 
classes: 

(1) those incorporated into the structure of the experiment (alcohol-dose in Example 
38.1); 

^ 38°l) e "anT d0m ' Zed ^ ” ° f ^ experiment ( n0 ™al drinking habits in Example 
(3) those neither incorporated nor randomized out (time of day in 38.10). 

actual layout of the'' e exp t eri t ment SeS re( l. ulre positive action, affecting the 

contrast, the last of our three classes a residual^ 1111231 - 0 " pr ° Cedure em P lo y ed ’ By 


1 

I 


\ 

J 


\ 


one. It is true that the experimenter 
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liber» td ?' remove its possible effects—^Exa^pufs.l th* n °‘ * Ve “ 

^V^Jmosr certainly negligible in this way However, a factor'ma^fod 
r< dr iver « * (J) sirop ly from being overlooked, like time of day in 38 1 # 
& »>£ 5 pS of the skill of the experimenter lies in his choiL“f f^ re t0 

its ' ,hstan tw r _ £ exoerur—* Tf - - r - 1 1 


OI oxvxxa - -r-—m ms cnoice of factors to 

A si> bstan j out"of the experiment. If he is careful, he will randomize out all the 
: are suspected to be causaUy important but which are not actually part 
' Lrs * „imental structure. But every experimenter necessarily neglects some 
j f rxP factors; if this were not so, the randomization procedure required 
0 ^rfiivakly c .-Un rnmnlicated. Thus the choice of factors to be randnmirpn ™,. 


,f the e ^l a1 ,cal factors; - — ---- jjiuccuure required 

■ • .n Thus the choice of factors to be randomized out 

A. 


0 
con 1 


iCe ivablyJ^^ibiy complicated 


is 


blV COlUpAi^a^v*. 

H be !!" P a matter of judgement. 


essen’ 


^ saW in 38.7-8 that the population, within which inferences from a 

38.1 2 eriment may validly be made, is a hypothetical one depending upon 
ran dofli‘ ze ^ ran domization itself. The experimenter, however, must apply his 
the pr° ceSS to the real world. In Example 38.1, the hypothetical population for the 
inferences ^ £ rom t he experiment includes every possible allocation of alcohol- 
inferences^ ta ^ n g p ar t. How far could these inferences be extended to cover 

t'tlger populations of 

(a) all male automobile drivers; 

(b) all automobile drivers, females included ? 

If the drivers taking part in the experiment were a random sample (not necessarily 
• Dk random) of all male drivers, few experimenters would hesitate to generalize the 
2ngs of the experiment to population (a). Similarly, few experimenters would be 
rash enough to generalize to population (b) without further knowledge, perhaps from 
other experiments on female drivers. However, suppose that the drivers taking part 
were selected from the employees of one corporation. Even if they were randomly 
so selected, this would only give us comfort in generalizing to the limited population 
of all drivers employed by that corporation. If (as is commonly the case) the corpora¬ 
tion was not chosen by any other process than its own self-interest or its willingness 
to co-operate with some scientific body, further generalization o t e resu ts o e 
experiment is a matter of judgement. Only in so far as the experimental material is, 
or is judged to be equivalent to, a random sample from a larger popu ation may w 

generalize the experimental results to that population. ... • 

There is ultimately no escape from the use of judgement in t is connexio , 
there are always the problems of generalization in space (e.g. to ot er coun r 
in time. 


We have dealt in a very compressed way with some of the fundamental 
questions of experimental inference. For a fuller (and large y non e 
^ reader is recommended to read the book by D. R. Cox (1958a), as well 
hsher (1935). 
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S—r -s srs- <r"£3zs sst"4i 

“ randomization out such effects are of P. dentical Thus, agncultu^ 

mental structure. Ip , cann ot be P h V slc I ver close together) will „ 

experimental units themselve rf , and whlc h ( ho ^“ umber 0 f plots requ ire 

experiments are^"teris&s. What » probably also increa e 

have identical fertility cM» variation in their fertility wu P rather ■ 

for the experiment incr a num ber of smal g ‘aYmilarly genetic considerations 

Thus it appears advisabl heterogc neous plots. Siroll “‘^ b g of the same litter 

isss™—'—■— -r—“"”*“»»* bt H « 

™t litters are generally of rather s ” al S, “; S a icultura l example, i..the experimental 
the individual animal, like the plo nlots are called blocks of expert- 

unit; while the litters, like the groups of ^milar P ^ experiments are block 

blt * 

•xpzxzsssssz* .—--- 

(1952). . t 

38.15 Suppose that an experiment iti carried out “^‘“vationsTn'aU. \ 

The^pe^nen^i^t^'investigate ^cross-classification or hierarchical classification 
fir a S of these) of certain factors of interest. Suppose that there are t distinct f 
cells in the classification; for a two-way classification with r rows and c columns, e.g, ; 
we have t = rc. We shall call these the t “ treatments. The problem is how to | 

allocate the t treatments to the k units in each block. . . j 

We shall assume that no treatment is to be allocated to more than one unit in each j 
block. W This is a reasonable assumption, since there seems little point in duplicating 
a treatment within a block, rather than using it in a further block, if more observations 
are required. This assumption implies that t^k. 

38.16 The experiment may be completely described by defining a treatment 
matrix of order (kxt) for each block. For the 7 th block, ty has its (/, i)th element 
equal to 1 or 0 according as the ith treatment is or is not allocated to the /th unit in 
that block. Evidently, there will be one non-zero element in each row (since one treat¬ 
ment is allocated to each unit) and no more than one in each column (because of the 
last assumption in 38.15), 

If we require only to describe the allocation of treatments to blocks, without reference . 
to their allocation to units within blocks, we can condense the information from the ' 


(#) TL* * # * " " "" 

(Cf. Tocher a?™))' 1 ™’ Wh ‘ Ch satisfied b y a, l well-known experiment designs, may be relaxed 

















ape Riments 

, .^atment matrices t ; - into the incidence matrix c 
thich has its (»', ;')* element n {j equal to 1 or 0 1 the ex P eri ment „ U5 

;*tment occurs in the jth block. 0 Wording t „ whi 0 ^ («*»), 

0r not ttif* ,vu 


38.17 


If n,- is the jth column of n, and I - K , u , 

t;i, = n ; ( P x! ) vector of uni 


0r n °t the ith 


! . . . , = n,, ' A ur u nits, We h ave 

1 for this is simply summing each column of t. a1q„ • n« n 

' ^ SWCeaU ^ in,, are S 


have 


we 

tjtj = diag (n 3 .) 

where diag (z) means a diagonal matrix with the vpri-n (38.2 

If the fth treatment occurs r i (>0) times in the ex 2 aS diagonal - 
(f x 1) vector of the r i} summing each row of n gives” 1116111 ** * WhoIe ’ and r »th 

nl 6 = r 

: and summing each column gives (38.3 

) . . n'l t = kl b , 

, since there are k units in each block. Further, (38.2-3) give 

i E tjtj = E diag (tij) = diag (S n y ) = diag (nl 6 ) = diag (r). 

38.18 In accordance with the spirit of our earlier discussion, the allocation of 
treatments to units will be randomized independently within each block, but we shall 

i not for the present consider the effect of this within-blocks randomization upon the 
\ inferences drawn from the experiment—we return to this in 38.41. Here, we regard 
, ' the randomization as a general precautionary measure against bias, and we conduct 

t our analysis in terms of the linear model (Model I) familiar from earlier chapters. 

I 

i Linear model for block experiments 

38.19 An obvious linear model for a block experiment is 

I E(y tj ) = r*+&, (38 ' 6) 

1 where the are treatment effects and the (lj are block effects. Here, we are assuming 

that treatment and block effects are additive, with no interactions. 

. We saw at (19.19) that the only linear functions of parameters;wh,d' 
biassedly estimated by linear functions of the observations are m , 

of their expectations. Thus, in (38.6), only linear combma ions of die^W ^ 
he so estimated, as is obvious from the fact that any cons an ^ t his lack of identi- 
subtracted from all the & would leave (38.6) unaffectec. y ol 2 , and Example 

Ability of the t* as we did in the singular model m l - ^ ince the block effects 
19.9, by introducing a linear constraint upon t e para • t u p 0n them alone, m 

ft are nuisance parameters, it is natural to impose the const 
1 the form 

= -hie (zero), whose “expectation” 

In effect, we add t-n tbe » v„ a dummy random varta 





otaTISTICS 

tHE ADVANCED ^;; is l t3ngl e «he unwanted h.och e«ec ts 
"■ sum of .he ft./? f w rchwe U a.e in^- paraffieters for ^ 

and ‘he — <M 

arra Lwlal (i*t) ma r„ U ° ... (3B.7\ 



array them a ma tnx U 0 

an orthogonal (* ^ TJien 

rows arbitrary, say «• 


J 


u„p 


(i> 


since the constraint on the ft * ,, p = 0 . 

0 is orthogonal, - % -' f ?) ^ 

Because U„ is ortn g / 0 \ = u ' u (3. 

>/ 


(38.7) 

(38.8) 


p = Uo 


Thus if we define 
(38.7) and (38.9) become 


, U P. 

a = u[3, 
0 


(38.9) 

(38.10) 


u » pt W7 
p = u' O.J 


(38.11) 


38.21 We now proceed to the y“J“ y fo/tle vector containing 

L ■.«- - ■*— 

(3-1). Then (38.6) may be written ^ ( 38-12 ) 

y 

where X and 0 are conformably partitioned, with 

Ai I i * 1 






X = 

bkx (t^b — 1) 


and 0 = 

(<+&—i)x i \a 

The errors in (38.12) are as usual assumed to 
be uncorrelated. 



(38.13) 


(38.14) 


mean zero, variance a 2 , and 


to 


38.22 From (38.13), we have 


XX = (_.•»_ 

I ? vlW 


ding (r) | nu' 


(38.15) 


( 










DESIGN OF EXPERIMENTS 

(38.5). (38 1) and the relations 41, = *, uu ' = x 


Writes 


ft - {diag (r)-nu'un'/^}-i } 


(X' X )" 1 = (---- Rnu'/k \ 

\-un'ft/A j {i b _ x+un / OtkxJ/kf/k ) 
may be verified by multiplication of (38.15) and (38.17). Also 

/ \ / T \ 

X'y = -i__--] = [ \ 

\ 2 MlyJ luB r 


ma y now invert 
(38.16) 


(38.17) 


where T = Uj »d T t is the total of y for all units receiving the fth 


(38.18) 


treatment, 


while B — ^ i J > where Bj 1 1 Y j > is the total of y for all units 

From (38.17-18), the LS estimators are 


in the ;th block. 


where 


= = (X'X)-iX'y 

t = ft(T—nu'uB/&) 


(38.19) 


a = u(B-n' x)/k. (38.20) 

From (38.20) and (38.11), we then obtain for the original block parameters (3, 

(3 = u'ot = u'u(B-n'T)/^. (38.21) 

38.23 We now simplify the estimators (38.19) and (38.21) for computational 
purposes. First, since U 0 in 38.20 is orthogonal, we have 

i„ = u;u 0 = (6-5i;y(Mi;)+u'u 

SO 

u' u = I ft - l b 1 'i/b. (38.22) 

Substitution of (38.22) into (38.16) gives, remembering (38.3), 

ft= {diag (r)-nn' /k + rr'/(bk)}~\ (38.23) 

an d (38.19) similarly becomes 

, 4 = SI {T - nB/A + tG/(bk)} ( 38 - 24 ) 

w e re we have written m 9 n 

(7 = l'B = l'T 

grand total of all the observations y. (38.24) may be further simplified, for 
W23) gi ves 

ft- 1 ^ = r-n(n'l t )/k + r{r' l t )/{bk). 


(38.26) 



THE ADVANCED 


THEORY of STATISTICS 



Using 


(38.3-4) and .heir consequence ^ ^ 


terms in (38.26) 


cancel, and G r * 

£2r = 


Thus 


the last two 

and (38.24) becomes ^ = n(T _„B/A) + l<<?/(“)• 

to obtain 

^ ^ 6 ^ (38.25) and (38.3), 


Now we 
This can again 


substitute (38.22) into (B-n'4)A 


(38.27) ; 

’ 

(38.28) 

(38.29) 

(38.30) [ 


be simplified, for, using 
l'(B-n'T) = G-r'x 


_ G-t'Q(T-nB/k)-t'l,G/{ik) 


(38.31) 

H8 28) implies r fi = 1*. If 

on ’substituting (38.29). ijjwe find that it reduces to zero. Thus 

we now use (38.25), (38.4) and (38.27) in (3».oi;, 


\ ’ 
m • 


(38.30) becomes simply = (B-n'x)/k. . ( 38 * 32 ) , 

Apart from the calculation of at of tre »- 

2S.SS °" b 

(38.25). 

AV for block experiments . . T 

38.24 In order to construct the AV for the block experiment, we now require [ 

the Residual SS. This is „ J 

S 0 = (y-X9)'(y-X$) = y'y-y'Xe, 

and using (38.18) and (38.11) this is 

Wy-(*)'© = y'y- T '*- B ' u '* 

= y'y-T'x-B'p. » (38.33) 

In general, the AV is non-orthogonal, since the off-diagonal matrices in (38.17) are 
non-null. We must therefore, as in Example 35.4, 35.38 and 35.43, find the Residual 
SS, say S 1} when there are no treatment differences, only block parameters and a single 
treatment parameter being estimated. The difference S l — S 0 will then be the SS 
attributable to treatment differences. 

m 

■ 38.25 We thus require to modify t so that it is of form fl ( . In (38.24), this gives 

fl, = S2{T-nB/A+rC/(«)}. 

We substitute Sir for 1, from (38.28). Premultiplying by Sl~\ transposing and post- 
multiplying by 1„ we have v \ 

fr'l<= {T-nB/*+rG/(M)}'l,. f 

The first two terms on the right cancel, because (38.25) and (38.4) give f 

li'(T-nB/fc) = o. (38.34) : ' 
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, „ remaining term on the right, 

find, usioS * r = G/{bk) 

it, = l t G/(bk), (38.35) 

' re in the absence of treatment differences, the general 
° r • ...itively obvi r S Utimator for all treatments. We now find from (38.32), 

« rtSll f/(W 4) that in this case 

\ < ng 35) ““ d {3 $)*_*,,= (»-!(, G/b)/k. 

I iisfoS . /jg 33 ), we find using (38.25) that 

• 35 ' 6) in Si = y'y-B'B//r, 

is asked to verify in Exercise 38.1) 

(as the read _ T '* + B'j*-B'B/J> 

= (T-nB//e)' ft(T-nB/fc) (38.38) 

^ent differences, while from (38.37) the combined SS for blocks 

| is * h ; h fgenerirmean is VB/k. 

| ' ^ W e may now display all these results m the AV table: 


SnbstW"" 8 P 8 - 


(38.36) 

(38.37) 


so 


that 


'j 

/ 


Source of variation 

SS 

D.fr. 

^^SSTeffects) 

Block effects (ignoring 
treatment differences) 

Residual 

General mean 

T't + B' p-B'B/6 = 

= (T-nB/fe)' £2(T-nB/fe) 

y'y-T'x-B' p 

G*/(bk) 

t- 1 

6-1 

bk — b — t + 'i 

1 

Total 

y'y 

bk 

i --- 


(38.39) 


I The d.fr. for the Residual are obtained as a difference. (38.39) make * it 

our analysis has simply separated off, from the d.fi. remaining a e (t—\) d fr. 

| independent block parameters a and the general mean are allowed for, (t 1 ) a. 

for treatment differences. 


The design of block experiments _ (0 t /iq 23) 

j ^8.27 The crucial computation in the preceding analysis is t at Yec tor r 

t teqmring a matrix inversion, ft depends upon the dispersion matrix 

obtained from it by (38.3). Since oHX'X )- 1 is, by (19.16), the dispe 

l 1 ° (38.17) shows that (38.40) 






V(t) = cr 2 ft* 
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HEORY of statistics 

the ADVANCED v s0 that the dispersion matrix 

1 • PYneriment (i.e. ch0 ° s . f m J n determining thu 

We now seek to desl 8 n est j ma tors has some es * re which w jH lead us to the m 0st 

ments (such as t °s estimators) are optimum m generalized variance**) 

ai1 r^xrt:: v-j * 

(the determinant of (38.40)) and have °P«™ ^ these optimum properUes 

equality of all treatment P^e'ers. Rem ^ fcy a random pro cedure, although 
are not retained if the design i; f e d as block size k—r «• 

the generalized variance is stl1 ™T,' C “ random allocation {including random balance) 

For some theory and discussion of such (1959), the discussion 

designs, ef. Dempster (1959). 

of the last two papers by Youden et al. (!*>*). an 

28 If we choose n to make SI diagonal, the treatment parameter estimators 
will^uncomkted (orthogonal) and in addition the required --.x .nverston 
be trivial, will be diagonal if and only if its inverse is, and isincethe fir n 

the braces in (38.23) is already diagonal, we require only that M = nn -rr/o s ould 

be diagonal. Now, using (38.4) and (38.27), . 

1JM1, = l t / (nn / -iT , /&)l < = kl' b kl b -(bk) 2 /b = 0. (38.41) 

Thus the sum of all the elements of M is always zero. If M is to be a diagonal 
matrix, its off-diagonal elements must all be zero, and therefore the sum of its diagonal 
elements, say M u , must be zero, i.e. 

0 = 2 M u = i (i nfj-rf/b) = 

i= 1 i= 1 V=1 / 

on using (38.3). Thus we must have 


2 2 (nq-u/b)' 

i=l j= 1 


(38.42) 


in matrix terms 


n ij~ r i/ b = 0, all*',;', 


(38.43) 


n = rl £/6. (38.44) 

he condition for SI to be diagonal is thus that every block contains the same set of 
eatments, and every treatment is applied to a total number of units which is a multiple 

tL, !n C , e h trea ‘ me , nt ““a ? ° r , 1 * imeS in “y block > and ™ treatment occurs 
“STr; Wh0le ' thlS imp,ies that hutment occurs once in 

(38.45) 

(38.46) 


(38.44) becomes simply 


r = bl, 


n = 1/1J. 


\ 

J 


[ns. 


Within any design, we kn ow from „ . 

variance amona 1 ^. • V °l- 2 

am °ng linear estimators. 


TK * estimators minimize the 

e result above refers to the choice between 
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matrix of a randomized blocks design. 

, 8 . 2-3 give related results. 

blocks designs 

, therefore arrived, by a lengthy route, at the randomized blocks 
ia fe briefly mentioned in 36.39, when we first discussed the randomized 
desig° s W f t s to treatments. The structure of a randomized block experiment 
Ration oi unl .. t treatments are randomly allocated to the k = t units in each 
!j extremely s ^ au g e 0 f the diagonality of ft from which the designs were deduced 
0 f b blocks- estimators of the parameters and the general AV table for block 
j n 38.28. the simplify greatly. Using (38.45-6) in (38.23) we find 

» /T A A 


experiments ar w ' ft = I A 

nf 138 46) and (38.25) in (38.29) and (38.32) gives 
while use V $ — T/&, 

p = B/t— l b G/{bt). 

■ -1 Iv the SS for treatment differences (38.38) becomes 
Siml1 y> S t - S 0 = T' T /b - G 2 /(bt) 

and the Residual SS (38.33) becomes 

and 5 0 = y' Y - T T/ft - B' B/t + G 2 /(bt). 

The reader is asked to verify these formulae in Exercise 38.4. 
38.30 We now display the simplified AV table: 


(38.47) 

(38.48) 

(38.49) 

(38.50) 

(38.51) 


AV table for a randomized blocks experiment 


Source of variation 

Treatment differences 
Block differences 
Residual 
General mean 

Total 


D.fr. 


TT/b-G 2 /(bt) 

B'B /t-G 2 /(bt) 6-1 

y' y -T T/6 -B' B/t + G 2 /(bt) (t- \)(b -1) 

G 2 /(bt) 1 


(38.52) 


>mparison with (38.39) reveals the extent of the simplificati . t t) and 

The symmetry of the table (38.52) as between treatments (thj^ ^ ^ (mlis ance) 
°*s (the symbols B, b) makes it clear that blocks are in fact b ® g , y We f or the two- 
m the analysis, for (38.52) is formally identica with _: ^ and Exercise 

ross-classification with one observation per ce ( • t u oue h we can separate 
’ As always in that analysis, there is no Interactions » 






qT ATlSTI cS 

THEORY 0 Example 35.3. Thu s 

132 . f ftoffl the Re^ u3 ' e m odeI that trea 

off a si "8‘ e £ ,hc assump tlon ® . . tests were taken, in th e 

that negfett expet^t^'^^^p^erhfftent^in^ndom^^ 

<hiS CritiCiSm 15 m " 

SocMeach block consisting ° nuisan ce fact° r * d by the 4 blocks used. 

mfe ing degrees of *-**; (hat ( * -1) d.fr. «• ^ ith i„ blocks, to keep the 
*"j£I. wUI be seen frommust be correspondingly 

lessary, owing to great varia r ber 0 f blocKS mber of de g r e es 

If h 18 " ec f S D er block very small, tn . Thus a Iarg an a , 

SZ&S&&S 7 

native experiment e g ^ , s s mall. . . ,, cbs we must arrange hat 

^To economize the dumber ° as in38-l^ n ^ simple ' rows and 

r, 0 "«r;: saW „. /=u , .«. 

fa = Pi+rp L ' ’ ’ 7.' ’. ’ tflP effects of two nuisance factors, \ 
Thus we are, in effect, using the blocksi to ehmm ^ ^ blocfa (3g .s 3) is evidently '* 

corresponding to the row- and column-c tw0 nu isance factors have main 

—rl rx - false, the analysis is invahd. 

As before, we impose the constraints 

1 W, 


Z P, = 'Zy, = 0 


1=1 i=i 

;n sure that the treatment effects can be identified. There will thus be only 
+ d.fr. absorbed by the blocks. 

tVe may now sketch the LS analysis, which is quite analogous to that of 38.15-26. 

18.32 There are now nm treatment matrices tas in 38.16. We let n be the 
lence matrix of order (txn) relating to the rows, so that n u is 1 or 0 according as 
■th treatment occurs in the /th row (irrespective of column); similarly, m is the 
ence matrix for columns, of order (txm). We then have, as at (38.3), 

0 implement (38.54), (38.8) is replaced by 

. „ i;p = i; Y = o, 

nstead of a defined at (38.10), we have 


® = up 
8 = . 





Kjb '4 

p : design of experiments 

^ °f OI -der 1) an d u » v are analogue: 

j g of ° r 

i f ^ * , o, with y a vector containing ntn vectors y w> 

• j ^8.12) n ° W ° /‘V i 1,tU; ’| V‘' 

X 

«BltX(* + «+»- S) 
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; s of u in 


jno 


t 



(38.58) 


(< + « + )» —2)Xl 


(38.59) 


\Ve 



/diag(r) 


mv' 


un' 

j A-i I 

0 


\ vm' 

I 0 i 



(38.60) 


(38.61) 


\ 

S 

: r 

f 

1 


. .r «rrite analogously to (38.23), 

1 n = fdiag (r) - nn '/(mk) - mm'/ (nk) + 2rr '/{nmk)} -*, 

inverse of (38.60) is 

)* (X'X)" 1 = , . 

/a I -atm'f(mk) I -nmv'J{nk)~ _\ 

Z^a/imk) j"flU+ WtoMArtfr un’flmv'/H 1 ) ___ j, 

\ lym' Ta/[nk) i" ’ ‘”■vm' Stou7(»m**) " | {I m -i+»' /(«*» /« / 

' 1 ’ (38.62) 

so that (38.40) remains true. Just as at (38.18), we may write 

X'y ={pj ( 38 ‘ 63 ) 

where T, R and C are respectively tlic vectors of treatment, row and column totals. 
G is the grand total as before. 

38.33 Multiplication of (38.62) by (38.63) gives the LS estimators. Instead of 
(38.19), we now find ., 

t = ji{T - nu' uR/(mk) - mv' vC/(«k)j 

»Mch simplifies as at (38.24) to 

* = £2{T - nR/(tnfc) - mC/(nk)+2tG /(mnk )}. 

As « (38.32), we find p = (R-n' t)/(mk),\ 

v = (C-m'4)/W'J 


(38.64) 

(38.65) 

(38.66) 



















tH E aP vAN (3 s.33)> 
***" &■ 



C'f 


( 38 . 67 ) 

( 38 . 68 ) 


( 38 . 69 ) 




we 


( 38 . 7 0) j 


, of (38.65)- ( 38 ' 7 .°.) is 'hi 


1 “ ‘ M°t"h e ResidU “‘ v ’f'R' B/( ” !J is fro* (3 f 
fin d th 5x 68 y J differences - Qi /( n mk), 

oC for treats ^ p'tf*' _ rI ght - ' aS at w lt fc 

- ** the iitrix in con S trncte d J verification of th, 

w.^ ,tus ' 

wher „f (38.38)- bio* effects- 

« A.»**5. - 3 (hod 38.31-3 is mos , 

formal* above ' . faCto rs by * e m ^i s ts of a single unit, 

Latin ^ |iffljnat ion of wo n«f nCe Each block ^ th ing a bout how ‘he 
38.34 The etaffl = „ and * - J- 3 we have said t ^ n = m, So 

commonly effect^ array. In We now assum ‘ d om ly allocated. 

an d the units are in a q t0 thei block ts are to be ran 

If * impose the of the array, we^obtain h labe lled as A, fi, 

andiUS, h“«™P' ifiedby(38 - 7) ' 




square. 

C,D: 


A 

B 

C 

D 


B 

A 

D 

C 


C 

D 

A 

B 


D 

C 

B 

A 


( 38 . 72 / j 


D C B A 

Euler studied Latin squares extensively from a purely mathematical viewpoint 
in the eighteenth century. The fact that they have come to be useful in the design 
of experiments in the present century is a notable example or the possible ultimate 
practical value of apparently useless theorizing. 

38.35 If we look only at the rows of (38.71), we have a randomized blocks design 
with b = t; and similarly if we look only at the columns. We may thus use the AV 
table (38.52) with b = t and either set of block differences in the second row of the 
table. This leads us to expect that we should be able to secants nff h^l *. r , , 
differences to obtain an AV table which in outline is: ^ ° f b ‘° Cl 

— 

Source of variation ^ r 


Treatment different 
Rows 


D.fr. 


ices 


Columns 

Residual 

General 


mean 


(t- 


t~\ 
t -1 
t -1 

■1)(«~2) 


12872 ) 
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* this is immediately deducible from the result* r . 135 

We leave this to the reader as Exercise 3 8 3 r 8,33 ’ b y Putting w 
^ k T two nuisance factors interact, contrary to thV 8 ' 6 ' g ** m = 1 

j* v» *■ " i, “ *• —*».. 

18 36 F° r a randor ™ zed blocks design, there is no diflU i • 
fi-we require only random permutations of numbe ff y !* Choosin g one at 
randorn c h 0 ose Wltll equal probabilities from am ° , from 1 to t ( c f 914/! 

*» tre ~. i r °m 3mong the (t!) 6 possible allocations 

° f For totin squa- however the choice 0 f a design at randotn 




in 


since 

from 

some 


» - not at all obvious how many squares of a g^en ^'forward, 

consideration of the cyclic permutations of the element t ’ th i 0ugh ^ is evident 
^ „/tnorPc nf snv nrrl^r l GHtS 01 the first TOW that 


it IS 


Latin squares of any order do exist. 


of 


S The numbers of possible Latin squares of order t is very la f . • 

, There are, for example, 576 squares of order 4- ifii ?on * §C *° r r gh values 
;i 2 ,851,200 of order 6. Up to order 7, they haJi been co ZT ^ 

examples of squares of higher orders are known, the problem of en h ° Ugh 
t>J awaits solution. Details and examples will be found in Fisher and Yat statistical 
Tables. 

By interchanging rows and columns the square can always be brought to a form in 
which the top row and left-hand column are in the order ABC, etc It is then said 
t0 be a “ standard square.” For instance, there are four standard squares of the fourth 


order 


4 

/ 


A 

B 

C 

D 


B 

A 

D 

C 


C 

D 

B 

A 


D 

C 

A 

B 


A 

B 

C 

D 


B 

C 

D 

A 


C 

D 

A 

B 


D 

A 

B 

C 


A 

B 

C 

D 


B 

D 

A 

C 


C 

A 

D 

B 


D 

C 

B 

A 


A 

B 

C 

D 


B 

A 

D 

C 


C 

D 

A 

B 


D 

C 

B 

A 


(38.73) 


From each of these, 144 (= 4! 3!) squares may be derived by permuting all columns, 
and all rows except the first. (There is no point in permuting the first row, because 
the result would be a repetition of squares already obtained with an interchange of 
the letters A ... D, not an essentially different layout.) The total number of squares, 
as stated above, is therefore 4 x 144 = 576. More generally, each standard square 
yields t\(t— 1)! squares of order t. 

It is thus only necessary to specify the standard squares. To select a Latin square 
at random, we choose a standard form at random and then permute rows and columns 
at random, the randomizing process being most conveniently carried out y using 
tables of random permutations (cf. 9.14 and Example 9.7). For squares ° °V" 
or more, where the standard types have not been enumerate , we can 0 
one of those which has, and hence select one at random from a restncted set 

possible squares. 

Three or more nuisance factors: Graeco-Latin and orthog ^ tQ p rov ide for 

38.37 There is no difficulty in generalizing the> ^ * this by a process of 
the elimination of three or more nuisance factors. 
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position of different Latin ^ j D {38.74) 

C B A 

„ A V C , ffprs of (38.74) first changed to 

„ s71 n wi ,h the Roman le the arrangement 

confu T 

the correspond,ng Greek ^ ^ Cy D J 

By Ad Dec C? ( 38 * 75 ) 

C8 Dy Ap B* 

m Cx -f rlk letter appears just once. The 

in which are s^Tto* orthogonal {Latin) squares for this reason. 

ThelTsuperposition (38.75) is called a Graeco-Latm square. remarkablyj theft 

T There P is P eviden.ly no Graeco-Latin tts for t = 6. Eul er 

is nonewhent = 6eventhoughthereare 812 851,200 Lat^q ^ ^ ^ ^ 

conjectured that no Graeco-Latm square exists when shrikande , 1959, 

nearly two centuries to disprove his conjee ure and show (Bo« when , = -J 

I960; Bose et al., I960) that a Graeco-Latm square exists P J 

nr 6 Fisher and Yates’ Tables give examples. . - . , 

The Greek letters in (38.75) may be used to identify a third nuisance factor (rows 

and columns, as before, identifying the first two), while the Roman letters are the 
treatments as before. The design then eliminates the effects of three nuisance factors 
in exactly the same way as the Latin square eliminates the effects of two. The AY 
table is an obvious generalization of Exercise 38.6, which we leave to the reader there. 

38.38 A further Latin square (using a third set of symbols, say numerals) may be 
superposed upon (38.75) so that each combination of any two sets of symbols occurs 

just once, and the three Latin squares are mutually orthogonal. This is true for the 
arrangement .. 

Accl Bp2 Cy3 D84 

By4 Ad3 Dy.2 Cp\ 

C82 Dyl Api B*3 (33.76) 

, •, . Dp3 Ca4 BS1 Ay2 

Gr ? ek letters and numerals are 
s 0 our nuisance factors. The AV is again^ t ^ S ^ es ^ n will eliminate the 
3,3, Nofurthe , . ls ag a>n left fo the reader in Exercise 38.6. 

T -hieve this—the 
^ a complete set of nrtb ^ ^ orthogonal squares 

f ° rtho « on »> Latin squares. Such 
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K ex ist if t is prime or a power of a pr i me fcf M 
co^Oexcept 2 and 6, when we have seen in 38.37 that not, v"" ( ' 949 »' and hence 
for a11 ' 4 !ists. For 10 =» K 30. Barra (1965) gives details of the knn” 3 P31t ° f ortho gonal 
q“ a(eS The complete sets have been enumerated for (< 7 p 5 ? of orth ogona 
<*ive examples for t<9. Fisher and Yates' 

f0 eS g , fct ;i s of the theory and construction of Latin en, 

For d int0 which we shall not enter here, the reader should 68 ° rthogonal sets 
.1 ***?£ Galois field methods due to R. C. Bose and m v ‘° ^ < 1949 ) 

3 bMOgraphy ° f * 

b y Norton 

38.40 The practical usefulness of Latin squares in experimental work is restricted 
by the condign drat the number of treatments must be the same as the numb™ of 

levels I" “\ 3 H t thlS f “ 10n increa ^ aa ™ pass through ,he 
Graeco-Latm to the higher-order sets of orthogonal squares. In consequence th« 

latter arrangements are little used. However, Latin squares are frequently used in 
agricultural experimentation, where the rows and columns of the square array represent 
the physical rows and columns in which the experimental plots (units) are laid out 
In this way, soil fertility gradients across the experimental area in these two directions 
will have no effect on the treatments. Of course, there may be fertility gradients in 
other directions, e.g. diagonally to the square array, which the Latin square arrangement 
does not eliminate. The experimenter will, however, choose the orientation of his 
rows and columns to eliminate known or likely fertility gradients. 

It is clear that Latin square arrangements are of possible use whenever there are 
two geographical or temporal co-ordinates to be eliminated, and similarly that the 
higher-order arrangements may be called on if there are three or more such nuisance 
factors. 


\ 




Example 38.3 

In the experiment discussed in Examples 38.1-2, it might be suspected that the 
day of the week on which the tests are carried out also influences the result. The 
hypothesis here would be that there is some kind of cumulative fatigue through the 
working week, acting similarly to the “ time of day ” effect already discussed. We 
Can eliminate the effect of both these nuisance factors by choosing as many times of 
da y as there are days in the working week (say 5) and arranging the experiment in 
a 5 x 5 Latin square design. Notice that only 5 treatments (alcohol-doses) wou 
e possible if we used a single square. There is, however, nothing to prevent our us g 
aa many 5 x 5 squares as are required to test all the proposed treatments, so long as 

J!? a k e latter a multiple of 5. , regarded 

f, m addition to time of day and day of the week, place of wor i tic f pat i n g 
ssa third nuisance factor influencing the experiment, we could choose Sn square 
But e % S ' Pr0m ? ve w °rk-places end arrange the experiment as a an g g ve times 

of d lt ls precisely the inflexibility of having to choose five wor p h ; c h often makes 
Jfcy, merety because the number of working days is fixed at five, which 
** designs inconvenient. 
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138 in randomized bloc s inference of the random 

Randomization and j queS tion of the effectjup» riment . The Sat ^ 

38.41 In 38.18 we left the blocks of a ^ ents t0 units in a La ti ' 

allocation of treatments o ^ ran( iom allocation - n sonl e detail. 

question arises with mspe ^ examine this que ^ ^ ( a ge netal 

square (cf. 38.34). vv j n the nrs v . * acts 0 f randomization 

Thc “nest™ 1 "'J£ Xnce) C -^gr tC tfTemS -k how this will *£ 

sssoSs-t rrss^s*—- - - 

their normality, rut tms way, , . 

f “ However, the question may be put more^directly and a ^ b j““ e 7 en in Chapter 31 
dons generate equiprobable sets of observa “°“’• distr ibutions can permit us to 
and subsequently that consideration oJ e r (distribution-free) methods. These 

replace normal distribution theory y g normality assumption is valid, and may 
often lose little or no effic ency even wh<» ^“TwhSer randomization can here 

° n the normality assumption< 

38.42 A detailed account of randomization theory m ■which^h^ 

Latin squares is contained in the penultimate chapter of Scheffe (19 )> g Ves 

references to the literature. From the point of view of estimation, the most interesting 
results are those for the expected values of MS in the AV tables, quoted from Kemp- 
thorne (1952) and from Wilk and Kempthorne (1957) (cf. also D. R. Cox (1958b)), 

For the randomized blocks design, the expected MS for treatments is less than that 
for the Residual by a term depending upon the interactions between blocks and treat¬ 
ments, as well as exceeding it by the usual term depending upon treatment effects. 
(It is noteworthy that the presence of interactive errors (cf. 36.41) between treatments 
and units within blocks does not affect the situation.) The Residual MS is thus in¬ 
flated by blocks-treatments interactions. However, this difficulty disappears if (as 
is often appropriate) block effects are treated as random, rather than fixed, effects • the 

Residual MS may then properly be used to assess the magnitude of the MS for treat¬ 
ments. 

For the Latin square, the situation is more complicated, for here interactive errors 
do have some effect upon the comparison of the MS for treatments with that for Residual 
No simple result emerges, for essentially the reason discussed in 36.42. 

thetny.^ For^disease where there are^ot^ >°^ ^ Under randomization 

the theory in 37.39-41, where we were cT ^ ^ 36 ‘ 41 ) w e have already developed 

effects in a two-way classification with W1 ^ a P ermutati °n test for column- 

a».3« that the «^lS^r fc 0b T5° n Per We h » e “ 

mmth T' tS0f 37 - 39 - 41 Hold for t,L f ° n r y With such a classification, 

here being interpreted as blocks andTe T"' CffeCtS m rand °mized blocks, the 

and the columns as treatments. We therefore 


\ 


t 
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may apply . u sual AV test for treatments in ( 3 a e-v, 

,„d of 37-2’ 28 indicated in 37.40. (A few sam j? 8 - 52 ). *«h d.f r . 
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, 

f 


,f 37.29 as indicated in 37.40. (A few e ’“‘ Y '° ,0 ^> with d f r 

W'nier indi T ^ the P ™ er of ™*l7“t “ P f me ^ b y Set 

and Alternatively, we may use distribution-free tests k a S ° robust to non 
37.41 and Exercise 37.14, with negligible correction m",™* 8 or "»tmal 
s# Iher of treatments or the number of blocks is not ton dfr ' if ««ker 

ate nnit errors, Ogawa (1961,1963) shows that the standard F, , 

• ctified as an approximation if the variances of unit effects wiX; u, , ma y stiU 
b6jU nt and the number of blocks is large enough. lun blocks are nearly 

COtl pven in tke absence ob unb: errorS) permutation test for tr^nt ~ 

T L squares is leSS satlsfactor 7 than 11131 for randomized blocks ktT effeCtS “ 
Nation in 38.42. As before, the expected value of the usual AvT\ . saw for 
fame under randomization as under normal theory, but the variance is competed 8 and 
n ronsequence the evidence for the approximation of the permutation aZ u s . 
foS theory is very limited (Welch, (1937); Scheffe (1™ ° y 

38.44 The fact that the evidence for the validity of normal theory tests in ran 
domized Latin squares is flimsy, together with the even greater paucity of such evidence 
for most other, more complicated, experiment designs, leads one to doubt the prevailing 
serene assumption that randomization theory will always approximate normal theory. 

There is a question of principle involved here. Is randomization to be explicitly 
incorporated into the theory underlying our tests and estimation procedures ? Since 
; N randomization lies at the root of the modern approach to statistical inference (cf. 38.7), 

- V it seems difficult not to answer this question in the affirmative, and consequently 
J difficult to defend the relative neglect of this admittedly complicated branch of distri¬ 
bution theory. 

The variances of treatment differences in block experiments 

38.45 We now return to the problem of design in the general analysis of block 
experiments given in 38.19-26 above. Instead of requiring, as we did in 38.27, that 
the treatment parameter estimators should be orthogonal (leading to the randomized 
blocks design, as we found in 38.28-30), we now formulate the design problem in 
terms of the variances of the differences between these estimators. 

Suppose that we wish the experiment to result in the variance of t e terence 
between the zth and /th treatment parameter estimators being 2 a 2 d w say. e \vne 
o i w il for the (t, l )th element of the dispersion matrix o> t ® trea 
estimators, equal to a 2 SI by (38.40) with SI defined by (38.23). 

w u —2iVii+Wu = 2du 

so that 

) T . . = K»« + f the matrix W ith elements 

we write w for the vector with elements {#«} ^ the f orm 

v“<W> with 4 = 0 by (38.77), we may write 

SI == wli + l/W +D = ( w i *<) (w7 D ’ 


(38.77) 

(38.78) 


(38.79) 
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identical with 



14 , it is identical witft 

Inspection of (38.79) makes > t cear y + (D + c l l l[) (38.8 0) 

SI = (w- + 1 A w 2 is itself at choice in the desi gn 

whatever the scalar r may be. Since w (a func^ t „ U se (38.79) with th e 

of the block experiment, no general ty ^ which differs from y a sc a ] at 

38.46 Suppose now that D a “"^‘‘J^entnquX so that we desire'ay 
treatment differences, has all its »*■*"* cision . (The leading diagonal of D , 
differences to be estimated with the s P fee achieV ed by choosing the dispersion 
of course, contains zeros.) This can, by • )> ^ diagona i elements w u are 

matrix SI of the treatment parameter es 1 . w i s itself a 

all equal, and its off-diagonal elements „ are a 1 equal In t 
scalar multiple of 1, and we can choose c m (38.80) so that 

W—|cl t = o. 

(38.80) then becomes Q = D (38.81) 

where D c stands for a particular matrix of class D. (38.81) and (38.23) give 

D c _1 = diag (r) - nn'/k + rr'/ (bk) 


or 


diag (r) - nn'/k + rr '/ (bk) 
nn' = k (diag (r) - D c 1 + rr'/(^)} • 


r = D c 1 l / 


(38.28) and (38.81) give 
and (38.83) used in (38.27) gives 

bk = l' l D;H l . 

Substitution of (38.83-4) into (38.82) 


(38.82) 

(38.83) 

(38.84) 


\ 

/ 


nn' = k\ 


gives the alternative form 

{diag (D-+ (38.85) 

Although we have derived (38.85) under the special assumption that D has all 
^-diagonal elements equal, it holds quite generally for any D, as Tocher MOW 
howed directly (cf. Exercise 38.8). 


38.47 Still considering the special 
diagonal elements of D are all equal to 
all equal to 2o*/a), we have 


assumption of 38.46, we sec that if the off- 
~a ( corres Ponding to variances of differences 


and since 


D c = D+d / i;, 

D ‘= -&+(*:--l)l,i:}. 


(38.86) ; 

t 


(38.87) 









0 f (38.87) may be verified to be 


Dr 1 = a 


\V 


rhicfi 


we 


simplify 


(ac-1) 
l + t(ac~\y tlt 


to 


D.- 1 = «{i,+^i ( i;}, 


«bet e 


_ (l'^)/{l + iK-l)}. We now find for (38.83) 
r = D” 1 !, = a (l+At)l t = rl t 


(38.88) 


where r 


same 


nuf 


„/i s-At). We thus see that each treatment occurs m . ^ 8 ' 89 ) 

m ber r> °f times ' Us ‘"S (38.88-9), we find for (38.85) ex P erif nent the 

nn' = k{(r-a)I l+ (a/t 

W ill be seen that the arbitrary constant A has now disappeared; it is only relevan, 

11 determining r- 


in 


The 


e design equation , 

3848 An equation for nn is called a design equation. Because of the definition 
f n in 38.16, we see that the (*, /)th element of nn' counts the number of times (say 
i ^ in the experiment as a whole that the fth and /th treatments occur together in the 
1 block. In particular, when i = /, the diagonal elements of nn' are simply the 
frequencies r { with which the zth treatment occurs in the experiment. 


) 


Balanced incomplete blocks designs 

3849 We now interpret the particular design equation (38.90) in the light of 
38.48. Equating its diagonal elements, we have 

(nn '} u = r = k{(r-a)+a/t] 

r = <**((-1)/{(*-1)1}. < 3SW1 > 

The off-diagonal elements of (38.90) are 

{mi'};, = ka/t = A, ( 38 - 92 ) 

say. Thus, in the whole experiment, every pair of treatments occurs A times together 
in the same block. 

(38.91- 2 ) give r(;e _ 1) = A(( _i). I 38 - 93 ) 

Since t>k, (38.93) implies A<r as is obvious J"eaihmS 
since each of t treatments occurs r times in the exp 

k units, we have (38.94) 

rt = bk. 

Using (38.92-3), (38.90) may be written ; (38.95) 

nn' = (r-A)I, + Al,li ■ ce matrix (38.46), 

ff r ~ A, (38.95) is satisfied by the randomized bloc definitions of' and 

fot then A = f, A = i from (38.93-4); this .s obvious from ^ 

Wc henceforth exclude this case, and consider J pg.ys) and the con 
A block experiment satisfying the design 
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r r-) THEORY w , . n These designs wer e 
THE A° vaN ‘ ^ (BIB) *** 'are that each of the t 

142 • lied a balanced igf ^*“ aC h pair of the treatment, 

(38.93-4) - called a fluff * /; units> whJ« * by the requirement 

introduced by Ja ( f , he } block led to BIB d S ^ varla nce 2o»/^ 

sr.?.rsa!ifi„ ~- cz : t £»— «• 


The “SJ twol commonly, the hrsr ^ ^ by integers 

determine necessary conditions ( su fficient conditions, 

38.50 Althoughjjfdfsign toads*. they are not m 8^. equa tion (38.95) for 

t, *, A, r, ») for * incidence matrix n „ ay be giv en. For example, 

since there may be no nr ry cond Hons 7 x s) ma tr.x n must 

nn' ( C f. Exercise 38.11). nn , , s non-singular, 

(38.95) implies that the (t x t) m (38.96) 

have rank £ and T " - 


(38.95) implies. --- <jo.V0) 

hWe lank * ^ ? > * • ,, v differently found by R- A. Fisher. 

/ h with (38 94) implies r ^)> a resu * orl S* na (\%S) summarizes other, more 

S;“:?22. i.. .1 *> t’s.»«««;. - - - 

"0^21, L i - Lai -1. * °< * >“ gl “ 

necessary, tor R — J ur » 

(19 For a given BIB design, X can always be incr ^ sed ^efstaitoly increased 5 ! 

?JX. ^ «* ^ ^ ”r 


ffiOakr,=ito^^ 

r«* e entTB de”gm-”amples appear in the table in 38.54 below. Further, 
several BIB designs of essentially different structure (non-iscmorphic) may exist for 
the same values of t, k and L This is intuitively obvious from the fact that the values 
of r and X are first- and second-order conditions only upon the disposition of the 
treatments into the blocks—they do not in general restrict the frequencies of triples, 
quadruples, etc., of treatments. 


\ 

1 i 


/ 


38.51 If t and k are fixed, the third constant of the design being at choice, a BIB 


design can always be formed simply by taking every one of the ( j selections of treat- 


X = 


’t~T 


ments as a block. We then have 

s- „ u- 2/ 

a IB design is called unreduced. Since it requires 


(38.97) 


\ 
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be satisfied *>y an unreduced design (38.97) with A - i “ 

. 00 y * 1 (38.97) becomes n * ~ 

Reduced. 

t0 Cince each of t treatments appears r times, and each of h ki i 
38 f in the experiment, it is tempting to interchange the role of treated 
b tii 1168 .’ .u design, putting t - b, b - t, r = k and k' = r to obtain a rhmi a • 
% d-l will itself be a BIB design if and only if the origin^ Jltfe 
doW ever ’ / c f Exercise 38.12 for illustrations). gn 1S 

<Sb design can be resolved into r subsets of blocks, each subset containing each 
lent exactly once, the des.gn is called resolvable, and each subset is a single plicate 
of treatments. We must then clearly have t = ck,b = cr , where c is a positive 
oftl r However, the latter is not alone a sufficient condition for resolvability 
resolvable BIB designs, (38.96) may be replaced by the stronger inequality 
. p to Bose (1942), 

du h>.t 4-r_i ( 3 8.9 8 ) 


hichis again only a necessary condition for the existence of a resolvable BIB design, for 
*9 Q8i actually holds whenever t = ck, b = cr (cf. Exercise 38.13). If and only if the 


m 98 ) av*n****v — ’ / -- j — 

uality holds in (38.98), a resolvable design has k 2 /t treatments common to any two 
[Jocks in different replicates, and is called an affine resolvable BIB design (cf. Exercise 


38 . 18 ). 


\ 38 53 Mann (1949) gives an account of construction methods for BIB designs 

due to R. C. Bose, whose fundamental series of papers, starting with Bose (1939), is 
listed in Guerin’s (1965) comprehensive bibliography of the subject—these and other 
methods of construction are summarized by Guerin. Muller (1965) gives a method 
for obtaining BIB designs from complete sets of orthogonal Latin squares when t 
is an integral power of a prime number (cf. Exercises 38.10-11 for some simple examples 
of such constructions given earlier in Fisher and Yates’ Tables). If t is odd and k 
does not exceed the smallest prime factor of t , a BIB may always be constructed by 
the method given in Exercise 38.20. If t is prime, this method is valid for any k<t. 


38.54 Fisher and Yates’ Tables give indexes, by the values of A and of r, of all 
known BIB designs with r^lO, together with combinatona met 10 s 0 0 
specific designs. Cochran and Cox (1957) give detailed p ans or a se e w j t j 1 

designs. C. R. Rao (1961) lists, and gives combinatorial methods , 

lUr<15 (which are also included and extended m t e t ^0 ^ith references to 
and Yates’ Tables) and Sprott (1962) lists designs with , 

constructive methods. ( 7 fnr w hi c h BIB designs, 

Table 38.1 gives, for *s;100 and 20, * e J a ^ tted from the table since there 

which are not unreduced, are known to exist. « , , When k = t- 1, there 

ls then always (cf. 38.51) an unreduced design with A 7 • % = t __ 2) which is 

always (by 38.51 again) a symmetric unreduced desig 
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our table. Further,^we may confine the table to the 
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also 


^tedf r( f n u = t-l already discussed) a design with k'>H l 

T*W one i k<¥ ™ lY \ C ° ml>lement 7 deSlgn ’ ob,ail *d ^cotitri™ 

a tary block containLg 

WC 


V _ (t -k )( t-k- 1 ) 
A(& — 1) 


> 1 . 


, 0 f BIB designs 

I **« The analysis of an experiment designed in BIB may now be obtained by 
T Institution in the results of 38.23-6, which are valid for any block experiment. 
simpj e .^ ^ 38 89 ) and (38.95) in (38.23), we have 

S2 - 1 = rI,-|{(r-a)I 1 +Al,i;} + ^l i i; 


= i'll, 


(t-k) 




k \~ i ' t 2 {k-\y 1 ^^ ( 38 - 99 ) 

using (38.93-4) to eliminate r and b. (38.99) is of exactly the same form as (38.87), 


on using ' 

and its inverse, as there, is 


o - Lh (*-*) 1 r\ 


(38.100) 


(38.101) 


as the reader may verify directly. 

Substituting (38.100) into (38.29) and using (38.34), we find 

t = ^.(T —nB/£) + l t G/{bk), 

At 

showing that the estimators of the treatment parameters are no longer free of the 
influence of the block totals—this is as it must be, since different sets of blocks are 
associated with the various treatments. From the definitions of n and B, we see that 
nB Ik is a (t x 1 ) vector whose z’th element is the sum of the block averages over a 
blocks containing the zth treatment. Thus 

T„ = T —nB/& f 38 ’ 10i l 

may be called the vector of adjusted treatment totals, and is evidently of direct 
(38.101) becomes 

k m ^ (38.103) 


t = f T a +1 t G/(bk). 

At 


\ 


(38.32) thus becomes, using (38.103) and (38.4), 


p = B/ft-in'T fl -l 6 G/(W). 


(38.104) 


the treatment differences SS in the AV table (38.39) is andusrng 

i 8 ' 100 ) and (38.34), this is simply 


| 

Eg 


« £t:t. 


(38.105) 
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We thus 


iE ort ° f 

have * » 5fe /^U--— 

ss 


THE ^A^table, speC1S 


sTA tisT iCS 

(38.39)i 



T^ent^”^ 

Block effects 

Residual 
General mean 



v , *t:t,- b ' b/ ' 
y y ;* 

Qijibk) 


y'y 


p.fr- 



(38.106) 


^ th at ( 38 . 106 ) reduces to 


, ; 7e( f blocks AV table 

the randomiz eo 


( 38 .^"we er p“« I Kempthorne (1952) 

Cochran and Cox ( 5 ) w | t h attention t take into account 


^Cochran and Cox (1«? > attention to the simp into account 

rfflfi SS^tSl-r«a" — s “ - 


the recovery of inter- 

c. R. Rao (1W)- must com pl e te our discussion i 

It right be supposed t^hyesuto Of 3 M ^ (38 . 6) 0 „ which all our \ 


38.56 It right be supposed thatthe ^ modd (38.61 on w......... out „ 

of the analysis of BIB designs, b“‘ ““ B “ ie we have been carrying out / 

results are based is a linear model with fa d g j ^ ^ ^ treatme nt parameters are f 
a Model I LS analysis, as in Chap ■ seen in 38 .j4 that the blocks in an 

concerned, this is ^ of n0 direct inte rest. The particular blocks 

ZZ1 experiment are no. essen.iai to it. It is not unrealistic therefore, to con- 
slder the block effects as random variables in our analysis. In the terminology of 
Chapter 36, we are therefore about to consider a mixed model, with treatment effects 
fixed and block effects random. This not unnaturally leads to a different analysis, 
which is usually called recovery of inter-block information. The analysis which follows 
is not confined to BIB designs, but holds for any block experiment. 


Mixed model for the recovery of inter-block information 

fw 3^ n0W omit the linearl y ^dependent block parameters a from the 

ml ifrfuti In , stead ’™ have a random blo<:k effect, say /).»> If this has zero 
mean, it will not enter into the expected value of v and it* vm-ian^ «> -n i. 

superposed upon the ordinary errors which stib’b vallan ee, say a}, will be 
la the notation of 38.16-21, Tir model’ isthen H 2610 mean and vanance 0>i \ 

%) = Xx, 


(38.107) 


*" rs?° and use a Gre ^«^b^,TmK.; m V ari* 












design of experiments 


/V 

X = ( 

(bkxt) l : 


Vb/ 

^observed that we ate still assuming „ 0 interactions between the bloch a „d 

" 'lent effc cts n he mo del are no longer uncorrelated, since anv u 
share > —value of , If we wrrte ^ 2 £ 

$» ****** /A 0\ 

’•'tv 

, a ^ on g its leading diagonal b identical matrices 

' vlierC A = h+pW* (38.110) 

se initially that p is known—we discuss its estimation in 38.62-4. 

W e sUpP ° imate T) we now require the generalized LS estimator, from 19.17 (Vol. 2), 

TOeS ’ * = (X'V-'X)-'X'V-'y (38.111) 

with dispersion matrix = (x'V^X)- 1 . (38.112) 


(38.108) 


4 the same 


(38.111) 

(38.112) 


38 58 The inverse of (38.110) is, as at (38.87), 


A_1-Ifc (l +kp 


(38.113) 


and using (38.108-9) and (38.113), we have 
X-V-rX = 

Substitution of (38.5) and (38.1) reduces this to ^ 

X'V->X = i{diag(r)-( T ^)f» i ”; / 

and since it may be verified from the definitions that 


2 n^n) = nn\ 

3 = 1 


(38.114) 


(38.115) 

(38.116) 


(38.114) finally becomes / o \ A (38.116) 

a'v-'a-M'i'W-feH' „ 

. . n o 1121 is the inverse of (3°- 110 h 

and the dispersion matrix of the estimatoi s a ( • 

a nd on substituting (38.18) and (38.1), this is^ \ (^ ( P (38-H?) 

\+kp) S 


x ' v_1 y 
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Using 


„« - er *” sr “ a 

,Kt T “ 

o , 11,1 "TLv) 

t* 'dasl'HiW 1 


('J^*\ 

iffkp/ ' 


( 38 . 118 ) 


-0 


■S 

is 


Q Il8) J'T 1 

(p 0)» P. k * {ding ( f )}' ’ precisely the result; 

38 - 59 ^ 

ana -he ^ ^ ^ ^ 

which we . as 0 fi ^ /?) ’ //&} is singular? as may 

• it*.—* 1-!... .. 

k \ trix inversion, since now ^ * >4) . Instead 1 

multiplying it ^ 1. and us,ng 1 (38.119) 

be seen by P os _ , j t h a t the com 

the "tore general {diag ( r )-»n /*} ( .^ (38 . 2 9)), provided^^ ^ 

,, 8 119) is satisfied by the estimator rj (^ rf (38 .119) estimate, 

JSriS-n “SmTSA* ,««■ -• “i,„d i. IS. mbs 

affects the estunatots s b J Mocl estimators re effects, they must 

I, is easy to see tha to ^ ^ for my fo ed se tot D y ^ ^ ^ 

jtiSSKr so when the ^^^LTfclodt analysis since the change 
re lame one is then tempted to use the sunp HoweV er, the dispersion matrix 

gators will, as we hav.^ ~„f (38.40). 
f the estimators is now the mv 

r _J^bvn'rrArl l-\1nr*L r c tnfu*o !r>_ 


t 


\ 


f 


is no 


timators is now me mw* - \- 

• i • + w in the case of randomized blocks there _ 
at all t^he eshma^btained by use of the mixed model rather than the 
fects one. Exercise 38.14 shows that the estimators (38.118) coincide with 
■a-block estimators (38.48) for randomized blocks, but that the dispersion 
f the estimators is changed in the mixed model, their variances being increased, 
ibvious from the fact that any treatment parameter estimator TJb is the mean 
ependent observations with variance o 2 (l+p) by (38.109-10). 


38.61 Yates’ (1939,1940a) original treatment of the recovery of inter-block informa¬ 
tion proceeded differently, by observing that inter-block estimators of treatment nan" 
l cou ! d b , e obtained front the block totals, that these estimators were uncorrelated 

waghted gi re the sma „ est attainab , e vari ° f ™ ° C0U ? *««*»» be stmply 
version of this approach is left to the r^A t? ^ .^ 1S met lc "^—^ le generalized 
approaches to relry^rf in^-Sock 38 ' 15 - The two different 

« to the same estimator in the BIB case alth ° U 8 h the J 

t zeroise 38.16). The reason is that 






\ 
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. . _ n f the two components in Exercise 38 15 i« , * . 

*>f££** ma,riX 0f *e intra-block estimators in the origTnalTfixedl? n 
10 Tl «h 5 e the dispersion matrix of the inter-block estimators is for the E ? 
^ mixed-model dispersion matrix were used for the intra-block estiEmto 
1 to methods would become identical, as is obvious from the fact that t at (38 im 
'ifofonetion of T and B only. ^- U8 > 

38. 62 Both the MV estimator * at (38.118) and *, in Exercise 38.15 are functions 
„ f th e variance ratio P = a t /o , and this must usually itself be estimated by some 

so that or *, (p) ™y be used. We first estimate a} and u* separately and then 

take the ratio as the estimator of p. 

To find suitable estimators of of and a 2 , we return to the general analysis of 38.19-24, 
but we now wish to find an SS attributable to blocks rather than to treatment differences! 
as in 38.25. We therefore find the Residual SS, say S ti when there are no block 
effects. The difference S 2 -S 0 will then be the SS attributable to block effects. 

38.63 We put (3 = 0 in (38.32), and obtain 

B = n' t 0 w 

where t 0 means (t)p = 0 . Premultiplying by \' h gives, from (38.25) and (38.3), 

G = r'T 0 . n 

If we now substitute (38.120-1) into (38.24), we obtain 

t 0 = n{T-nn'VHrr , VP} 

= S2{T+[S2-i-diag(r)]T 0 ) 
on using (38.23). Solving (38.122) for t 0 gives 

(t ) M0 = (diag (r)} - 1 T. 

We then have, in (38.33) 

‘S'a = y'y —T' (diag (r)} _ 1 T, 

and we find 

S 2 -S 0 = T't + B' (3 — T' (diag (r)}-'T 

= (T —nB/&)'£ 2 (T —nB/&) + B'B/& —T' (diag (r)} - 1 T, (38.125) 

using the first row of the AV table (38.39). (38.125) is the required SS attributable 

to blocks. We thus have the AV table, alternative to (38.39). 


(38.120) 

5 ). 

(38.121) 


(38.122) 

(38.123) 

(38.124) 


Source of variation 

SS 

Block effects 

(T-nB/kY Si(T-nB/k) 

(allowing for 

+ B' B/k — T{diag (r)} T 

treatment differences) 


Treatment differences 

T'{diag (r)} -1 T — G 2 /(bk) 

j (ignoring block effects) 


f Residual 

y'y-T'T-B' P 

General mean 

G 2 /(bk) 

Total 

y'y 

“---- _ 

i_____ 


D.fr. 


b -1 


t -1 


(38.126) 


bk-b-t- 1-1 
1 


bk 
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oF statistics 

the ADVANCED THEORV 0 " as they must be but qj , 

THE n rows are unchang ment and block eff ect r 

. ra l mean . j between taken into acco llT1 S i 

e^ly « block e ?fl S rHer is irrelevant onto?* I 1 


remaining Sb has differences, r aU1 35 4 that the or i n 

because here treatment d* ^ Examp le 35 . 


sidual SS S 0 divided by its 1 


remaining bo differences, 

because here treatment^ 
first. It will be remembere 

an orthogonal ana ys>s- ^ by the Res . 

38.64 As usual, we may no (38.19s,-" 

dA - SinCe £(5 ;> = ( ns'm)L! the SS 

• t tr 2 we first observe from (38.1/ ) 

To estimate Of» we n 

may be written ^ = 5 ^ - y'y-*S 0 -T {diag ( )/ 

(3 8 - 1 27), t+ \)o*-E[T {diag (*)}?]• ( 38 *128). 

= olndV(z) = -^(z'Az) 


! ! 


(38.127) 

due to Blocks (*$#, say) 


i 


so that, using (38.127), i\ ff s_.E[T {diag (*)} T J- 

E(S B ) = J- 0 andV(z) = o**,E(*Az), 

From the result of Exercis^M we know that O^y that T - 0, we have fro, 
, 2 „/iwV Thus, assuming wnnout 


a 2 tr (AW) 
(38.109-10) 


19.3, we know that u &[?} ~ v '' 
loss of generality that 




(38.129) 


E{y'y) = 

The model (38.107-10) implies that 

£(T) = (diag (r)}r, 

V(T) = o 2 [diag (r)+pnn'], I 

remembering the properties of nn' in 38.48. We may thus again apply the result* 

of Exercise 19.3 to obtain K j 

E[T (diag (r)} _1 TJ = <r 2 tr [{diag(r )}" 1 (diag (r) + pnn'}] 

= a 2 tr [1/ 4 - p {diag (r)}- 1 nn / ] 

= o 2 {t+p tr [{diag (r)} -1 nn']}. 


It is easy to verify from the definitions that 

tr [{diag (r )}- 1 mi'] = t, 


SO 


E[T'{diag(r)}-iT] = o 2 t(l+ p ) 

and (38.129) and (38.131) reduce (38.128) to 

E{S B ) = (b-\y + (bk-t)po\ 

Thus, from (38.132) and (38.127), 


(38.130) 

(38.131) 

(38.132) 


1 bk-^t > = 


_ _ o 

P a = Tji- 


(38.133) 



(38.133) and (38.127) give the remm-H ,• 1 . 

anoth rty in th ‘ S COnteXt ’ where e-s tim^L ™^ 0 ^: T f he ‘ r ratio ’ sa y P > hasn0 optimum 
nother, more complicated, estimator of n 2 .1 1 interest. Tocher (1952) gives 

is an 'Tl IS ‘r he MV unbiasse d quadratic ‘ f tHe err0rS have zer0 skewness 

p wh ° se - 11 zt: But what is reai1 ^ ^ 

G ^«-Week (1959) a d p . CStimator < 3SU18) * 

rayb ‘" and Ses hadri (I960) show that 4* 


-- , 
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0 , 1 / 1962 ) show that if the estimator of p i s a rat - ; n P ls used i n j t , D 

?nd Sh terms of the latent roots of nn\ unbiassedness of* qUadratic f <*ms of a cer 
design with r = and that p i s of 

pei-mutahon distributions for BIB designs 
38.65 After the general discussion of the mixed model in MW ,, 
rt vert to our earlier fixed-effects model for BIB experiments and , ' we now 

permutation tests for treatment effects. ’ d turn our attention to 

P Ogawa (1963) shows that if there are unit errors (d U An A 
for treatments may be justified as an approximation to ihe 
if 1 is toge enough and the variances of unit effects within btocks arfn ? bU, '“ 

A fortiori, this holds if there are no unit errors and b is large Y constant - 

If ranks are used within blocks instead of the observations, we may generalize to BIB 
designs the permutation distribution of the test statistic for treatment ? B 

cussed in 38.43 and 37.39-41 for randomized blocks. The results due to T)nrW~ 
(1951), are given in Exercise 38.17. Van Elteren and Noether (1959) showed that 
compared to the usual F-test for treatment effects, Durbin’s test using ranks has ARF 
exactly k/(k+l) times the Wilcoxon ARE (31.115), reducing to 3k/{n(k+ 1)1 i n the 
normal case. It will be seen that the ARE depends on block size, but not upon t 
It is interesting to note that here only the first two moments of the test statistic 
can be generally obtained, precisely because, as we mentioned at the end of 38.50, 
the BIB conditions lay down no pattern for the appearance of the treatments in sets 
of more than two. 

Benard and van Elteren (1953) give a large-sample chi-square permutation test for 
an arbitrary (not necessarily balanced) incomplete blocks design using ranks, repeated 
as well as missing observations being allowed. 

Preference experiments 

38.66 BIB designs are of interest in connexion with preference experiments 

(where measurements of degree of preference are often not possible, but rankings of 

preferences are). If preferences are to be expressed within b blocks of k objects 

(treatments) selected from t , the order in which these objects are examined may be 

important, and it is desirable to arrange the BIB to take this order-effect into account. 

simple way is to let the objects be examined in the orders determined y t e co umn 

Positions of a (t x t) Latin square. If the first k columns of the square are used thy 

^ermine order in t blocks of k objects, each of the t possi e o jec: s a ^ 

‘ n eac , h Position in the ordering. If b = ct, where c is a positive integ ,> § ^ ^ 

c °niplete order-balance by using the first k columns of c (t x t) L 9 

Way. jo _ 

Thus, e .g., the first three columns of (38.71) give a B * Bd j^ c h column position. 
*\ 3 and 2 = 2. Each of the letters A, B, C, D occ " rs ° Y ouden square design, 
An ^complete Latin square, used in this way, is kno 
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„„ ADVANCED THEOK ce may als0 be import, 

z An Of course, P^^ere “position may s , 

„ w T. Youden. u experiment, w , from the experirtw, 
S* « 38.34). *>>■ 

in any (not n ,« ulsan ce squares ( cr - 


152 



just as in our origin- . 

. ■ n (when, as we saw in 38.51, ^ 

Paired comparisons ^ 2 in a BIB deS1 | ) comparisons design, and i s Qf 

38.67 The particular ca descri bed as a i pat ^ DaV id (1963) has recently 

design is unreduced) is u f J experiments. H- hich inC ludes a chap ter 

particular importance m P^» for paired ““P^taportant of these are ^ 
devoted a monograph designs. Perhaps the ■ P j n these designs, each 

on appropriate experi deve loped by Bose ( ' f objects). Each p a i r 

— r pS objects (chosen from * l any two jud^ 

of t judges comp P h are exact ly % pair designs: each iud ?P 


r Y f: S^^oyects Wn ftom a to^- n ^ ^ jud - 
of (judges comp P there are exac tly 3 pa designs: each jud ge 

is compared by 3 judges a correspondence with W ts ? 

Ac the notation indicates, u . « mntainmg « sucn LA c 

is a “treatment,” each pair of objects a 0 j t h e linked paired-compariso n 

are i = blocks in all. The ! * (0 ap pear equally frequently m the 

designs is that we require each of J Thus, b y (38.94), we have 

r pairs of each judge, i.e. to appear 2r/« — “ times. 

oc = k(n-l)/t- - ' 

Because of the additional condition (f ^ 4 ),^existence of. ^^erT^ 

°Bo a se n (1956)7ves7nd David (1963) reproduces) methods for 

deriving linked paired-comparison designs from Bib designs. 


\ 

/ 


Partially balanced incomplete blocks 

38.68 The essential feature of BIB designs (cf. 38.49) is complete symmetry 
between the treatments, each of which appears r times in all and A times with any 
other treatment. This symmetry was a natural consequence of the symmetrical demand 
for the same precision in all treatment-difference estimators in 38.46-7. While main¬ 
taining the condition that each treatment appears r times, we now relax the condition 
that X be constant. 

SupposeJhat for each treatment the remaining (t — 1) treatments fall into m classes 
of s| 2 e <*.£«, = /-1. These are called associate classes, and any treatment in 
? e /‘ h aSS0cia,e cte is calfed a fit h associate of the given treatment. We now require 


that 


(a) all pth associates appear topethpr in i i , 

W » A is a pth associate of B B is a !ti • W ° Ck times i 
(c) the number of treatment ’ 5 u assoclate of A; 

»d jth associates of anotheT Ireatment 0 ^ f th , asSociates of » treatment A, 
A,B ‘ We wri te this number as P B ’ * he S3me for a11 *h associates 

A deS ' gn Mtisfyin S th “e conditions 7 called 

a partially balanced incomplete 


\ 
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DESIGN of EXPPo, 

, . rp, , . XPEr IMENTs 

, /pglB) design, lhese designs were fi 

$**• ^ COn ‘ ain BIB 3S ** and N • "* 

38.69 PBIB designs have many constants: tr , ^7^ 

, an d the * values tj,; and the There ; r . and k as brfo 

Le constants which reduce their effective numh Ver ’ >«>ear r l' t ' he n Val «s 
,h The case of two associate classes (m = 21 h Ju ' b<ms bct *«n 

,, 054 ) who give tables of all known designs J.r " StUdied in detail h„ * 
im plest case, there are five types of association schtlf^^O- Evenln'tW 
the associate relationships between the treatments! ffl l * e sch ™e which .1* 

Guerin (1965) gives an extensive summary of th„ • sub -‘ypes. 8 
an d construction of PBIB designs, with a comprehensTvTb?' 5 °? ‘ he existe "« 
the appropriate methods of statistical analysis, including recoven, De,ai,s of 

tion, are given by Bose et al. (1954), by Kempthome (1952), alfbfc 

Structured treatments: lattice designs 

38.70 Throughout our treatment of experiment designs, we have made no assume- 
tions concerning relationships between treatments. Now we suppose that the treat¬ 
ments in the experiment may be meaningfully classified into certain categories. This 
is the case, e.g., when the treatments are the rc combinations in a two-way cross¬ 
classification, the treatments then falling naturally into a two-way table. Block ex¬ 
periments talcing account of such classifications are called lattice designs, and were 
introduced by Yates (1936b, 1939, 1940b). They are of particular value when the 
number of treatments, t, is large, for the table in 38.54 shows how few BIB arrange¬ 
ments are then available. 

38.71 Suppose that the t treatments can be me^in^uUy arranged m» M 

two-way array, so that , = Ik. We might t ^s"of H 
l rows within a single block; and similaily t . . j blocks 0 f k units and 

within a single block. We thus obtain a design con ^ e ar i sin g from 

k blocks of / units. This is called a rectangular muced^ ^ ^ h * (!xft 
the fact that the treatments may be represen e . which has throughout^ 

array. To bring this within the scope of^ouri t0 

limited to blocks of equal size, we m above, it IS sometl , 1 in g ru ple la^ ce > 

k = l. With two replicates of the. T 1 i^v^vrith fow reP licat,onS ’ * 


lattice ; with three replications, a triple lattic , ^ j or although 

and The°s n q uare lattice is no_t a 
frequently^n’^he^samTblockwith ^^ 

*at the frequency of joint appear^ ^ ^ other ar«7 ^ of 1 onto 

column of the (kxk) array, \ two-dim ens ^" h ln have 
One can evidently gen*wJ« “y eatm ents. 
a ^-dimensional array containing 
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nt c Such p' am . me applications. | 

„ n Of the treaty s . raportantI n some PP 

not balanced. The cubic latfce g lack of ba , j 

just discussed. In th 1 2 3 (38. ^ 
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in a 


yields the Simple square lattice design 


1 2 
4 5 

7 8 9 , •, iQ 71 with six blocks 

as described in do-' 1 W1 


1 

4 

7 1 

1 

2 

2 

5 

8 

4 

5 

3 

6 

9 i 

7 

8 


3 

6 

9 


arranged in two 
replications 


j o ? | • 

complete replications of the treatments 


If we now 


(38.136) 

add two further 


1 

2 

3 

1 

2 

5 

6 

4| 

6 

4 

9 

7 

oo 

8 

9 


the ensemble of (38.136-7) is fully balanced, as the reader may verify. In fact, U 
BIB with t = 9, k = 3, l = 1, b = 12, r = 4. The four complete replications 
in (jo. 1 36-7) form a set of lattice squares, because they can be derived as in 38.71 
from (38.135) and the further array 

15 9 

6 7 2 (38.138) A 

8 3 4 

These designs are more valuable when k = t* is odd, only {(k-\- 1) squares then being 

remiired. as in nnr eyamnlp Wh^n h io onon Ail crniQr ^ c n ^ j^ i . r ° 


3 

5 

7 

the 


(38.137) 


These designs are more valuable when k = t> is odd, only l(k+ 1) squares then being 
required, as in our example. When k is even, k+1 squares are needed to form a set 

38.73 Details of the theory and analysis of lattice designs, into which we hav P 

Cnter h6re ’ are / 1Ven Kem P*orne (1952) and by Cochran and ell 
(1957). Their importance for our exposition is that thev bnv P i a n( * ^ ox 

a set of treatments which are “ structured ” at least to tJ ^7 *4 US t0 consider 
in categories. If * pursue this a ‘ eilst extent of being arranged 

are combinations of under,ying elects, we are W iTrlew ter^T' 8 

Factorial experiments 

3,1 P-Me combinations 

35.15-33 and 35.«,. (COmplcte ) Such “Periments 

chapters b« au t ( ^f « did not use ft t “'"V d ' SCUSSed in detail “ 

situations). Each defi„i„!!" 8 ‘ vei > ‘here are also a ‘ m . 1 . n °l° g y »f experiments in 

etc.) is caUedSf -f the ***££ ff ^ to "--experimental 

“ the ex P er iment. Each voi K ^-variable, column- 

h v^ue wh i ch afacto ; cantalej 
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f individual cell of the marginal classification by tW r 155 

vh definf factor- Thus what we previously called a (J r ! factor . is called 
Cl of d in this context as a factorial experiment with two?? 1 ^ 6 ^ 01 ' 
- > i& cc het at £ levels; and a (rxcxl) cross-classification ■ factors > one at r 

at '• * and 1 ^ r Wec«ively More L^ 1 

1 would be called a (,•) factorial, L so 

P* C ' 

main effects and interactions. Thus eg., in a (r L) factorial exp" 
< S „n observation per cell, we shou d have t = rc, and the (rc-1) d.fr. for treat- 
differences are to be resolved into r-1, c-1 and (r-l X c-l) as in Example 
Mor e generally, the treatment differences SS are to be subdivided into com- 
35 ' 3 ’ for all the main effects and the different-order interactions, as in Chapter 35 
^call this subdivision an AV for treatments. 

It will be seen from (38.52) that the treatment differences SS is IfTT-G*/*}, 

h t T'T lb now plays the same role, for a factorial experiment in randomized blocks, 
sot ; did in chapter 35. Thus an AV for treatments may be carried out upon the 
^eatment totals T t by exactly the methods of Chapter 35, reading t as n. It is necessary 
] to remember to divide all the component SS by b, this divisor arising, of course, 
I because each T\ is the sum of b observations. 

/ 38.76 From Exercise 38.6 it will be seen that the same simple AV for treatments 

f • terms 0 f t he T- may be carried out for factorial experiments in Latin square designs, 
the divisor here being t (instead of b) which is again the number of observations of 
which T t is the sum. The same rule holds for the generalization of Exercise 38.6 to 
Graeco-Latin and higher-order orthogonal square designs. 

“to far, the subdivision of the treatment differences SStatan 
itself, because in 38.75-6 we have considered onlyi act “™ both foese limits- 

of simple structure) in the simplest block designs. related treatments 

tions, and return to the consideration of experiments upon 

in the general block design. . ^ilulnrk experiment AV table 

A glance at the treatment differences SS in t e g en o simplicity of 38.75-6 to 

(38.39) will show the reader that we cannot now expe tbe t rea tment differences 

persist, for the allocation of treatments to blocks ^ 106), remembering t e 

1 SS through B. This remains true even for the BIB table (38. h 

V definition of T at HR 102^ . . - fo ui P f 0 r if the treatmen s 

A little thought will convince the reader that ^^Icome entangled with^ose 
0 not all appear in the same blocks, their effects must^beai ^ ^ preclude this. 

of blocks themselves, and balance (in the BIB sen ) 


new problem "^cesSS^^ 

F jnents of the “f'^se Wff ‘” th blocks. 

cannot be <^ a ‘ ed „ be w n/o«»* rf y ble , we consider qui( 

They are then s,d t0 the SS £«^ parameters, say ^ 

,8 78 Rather than “f" e linear functions of th^ icular , w e are interested 

Contrasts, it will ,„ rs /^o 19-14) shows that 

between the '^"Jeffor block experiments at ( • (38.139) 

Inspection o Cr « (C , 0) ^ annihi la t e the block para. 

where 0 is a (p x (*- «> mat “ ° f ^ ’ 139) , it is necessary 

-r-in order that a vector!* be unbiassed in est. mating (38.139), 

and sufficient by (19.19) that lx = (C ; 0) 
i.e. using (38.13), 



C, 


= 0. 


(38.140) 


r 


Thus, if we can find a matrix L, of order ( pxbk ), which satisfies (38.140), there wall 
be no confounding of the p linear functions Or. 

38.79 The equations (38.140) impose p(t+b- 1) conditions upon the pbk elements 
of L, and whatever p may be, these can certainly be satisfied if t + b-\<hh i a if 

** also be satisfiablc for some values of p J 

^riv“r;s:s d T;,- - xstzz i 

We therefore see that if there (. Wfl , ,, f 1Vla ln ^ ls context. 

confounding any set of li„W functions oTth 8 e h t b °r CkS ’ WC ^ aIways ’ if we wish > avoid ' 

weT f Z the fact tha « if ceSfunction,7 Parameters - is intuitively ' 

This ^f beritely add a f »rther set of bloclT “ nfo . unded 111 a given set of blocks, 

" Part,cuIar w »>ere each set of'block t “ ^ they are not confounded. 

1S a re P ^ ca lion of the treatments. In 
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re functions confounded in part, but 


li te f ^ ’confounded- 


n °t all, of the 


ex Peri m , 


- 
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j 


t 


**&"*’-" .. cnt are said to 

„ sn W e have SO far discussed confounding as though it We „ 

38.8“ .htedlv a nulsance t0 be unab le to estimate JL7^ . anevil tobeav„ia , 

1« u i, and even in the case of partial confounding ther"^ ° f ' he ‘™me* 
^<5 which may be irksome, while naturally the precision of S ' om PWntio na l com- 
P licatl ° confounded functions must be reduced. 1 the estl mators of the 

Lrtially c confounding also has its positive asDect r * . , 
.r^embered from 35.44 that the higlJoXIt 
«' 1 ,, be of little practical value. They are therefore often deliberafeT C °!“ monl J' 
f0 whe blocks in an experiment, the consequence being that their SS12 !? f nf0Unded 
WltK ^ of the Residual. Of course, we may carry out precisely the t ^ appear 
® par : n an unconfounded analysis, as indicated at 35.44 The Doint h merg ! ng 

II the desired linear functions of the treatment parameters, namely the main effects 
Id interactions. If some of these must be confounded, it is in general advantageous 
start with the highest-order interactions and confound as few of the main effects 
d first-order interactions as possible. 

^ To this end, Fisher (1942) proved, using Abelian group theory, that in a factorial 
eriment with 2 m -1 factors each at two levels (the 2 2m ~ 1 factorial), no main effect 
first-order interaction need be confounded in a single replication of the treatments, 
° r vided that k > 2 m , i.e. if block size exceeds the number of factors. He later (Fisher 
extended his treatment to factors with f levels, where p is a prime (cf. also 
MontW1949Yl. Kempthorne (1952) gives a very detailed treatment of the subject, 
including factors with different numbers of levels. Cochran and Cox (1957) discuss 
the annlfcations with detailed plans of confounded block arrangements. Yates (1937) 
Sany exlples, with applications in agricultural expenmentauon, whde Daves 

et al. (1954) give examples in industrial experiments. 

38.81 One of the important applications of confounding^ -/ or der that 

factorial experiments, where certain interac ion estimated by using only some 

the remaining main effects and interacaons m y ^ m essen tial , 0 enable 
of the blocks of a confounded block design. ■ £ able effects, which are called 

the analysis to distinguish between otherwise in g some discussion appears 

aliases. The theory is treated by Kem P tb ° r e 4 VdiscusWrac^^ 

in Cochran and Cox (1957). Davies et al. are often nseh.1 when large 

in their application to industrial experimen , 

numbers of factors are to be tested. oer ;_ 

available to the expen 

38.82 Many other confounded faCt a " a ’J e trby a asaig ning «en 

menter. Split-plot designs confound mai soinetime 5 |abora te, forms 

to the unite in a block, an oth er, more 

necessary for the practical conduct o 




statistics 

„ rn T HE° rY 0 „ re s pl aid “ e 

1511 .• in fe tori!d eXP ecialized bo° ks 

0 f confound' 1 ^, d in *e sp eia tion phas is on the problej 

c ribed and pVO iution ar ^ ? „orticul ar ein ^ pvoeriments fn 


^}2&£k3SZ2- 


bo« k b y. Q “ f e 1ong- te “ n ^omH 1952 ^ ent ly is the use of a sequent 

of Sing ‘ he , yi ^ 

iso Cochran field which of factors cost> or equivalently 

A different but ^ ^ optimum c * gf g process fo „ iU r/a« to the ex. 

„f experimenB *| of the end-pr is t0 fit a along a path 0 f 

(or some other q J d yie ld. T 0 f expert^ , d t0 investigate 

!o “ iniffli f e C 1 by LS, and to mov thej^ ^ is then explor ^ ^ ^ 

sSssafciSt ;~2 

- -—- 

and Behnken L \ Di by Box (1957). „ au ential methods—they have 

^Evolutionary operation methods ^P^have been criticized on this score, 

bEt^t" 'St^ dreir practical importance. 


^".his chapter with an account of the design of experiments whose 

° b ‘t « me2ned Ut in SuIxampS dealt with the problem of designing a 
Simple Unear regression experiment. We found there that we could minimize the 
sampling variances of the LS estimators of the two regression parameters by making 
half of the observations as far below the origin as possible, and the other half the same 
distance above the origin. We remarked there that this corresponds to the fact that 
a straight line is most efficiently “ fixed ” by its end-points. 

Consider now the more general problem of allocating values to the regressor x 
when the expected value of y is a polynomial in x. We have treated the theory of 
polynomial regression in 28.16-20, taking the values of x for granted; now we ask 
how to choose values of x so that the parameters of the polynomial 
are optimally estimated. 


regression equation 


» feed interval, which 

extends to ihe polynomial case, for we know’tW ’ i lntuitlve ar gument above ( 
xe f ^ (A+l) points. Moreover one Qt’ll & P° y nomia l °f degree k can be 

~ rt “ ° U * ht ” ‘o k e h end of h^^ in ‘ his ease that 

y * ‘He general This intuitive argument 

egression: Kiefer (1959) shows 







^HIMENTs 

lk + 1) distinct va ^ ues °f * are required n f i 

U P rovided that We ignore anai; ti a u Cha ‘^t ( ^ n 159 

^hich disappear as n —> co. Characti ^P'i^tio; ^ ^m.erio, 
iC'^se complications mto account, is not so simni • We ''«, thT * 
5*55 «o hoW here. ^‘^ve 

oL -slow consider the choice of the (k + W a;^ 

^ X l to minimize the generalized variance of the tsdmar^ 1011 P ° ints *i< * < 

■ • ',«* U nomial “ Se We haVe ° f the P ara meters. 


y = X 0 +e 

th e matrix has the form 
\Vll erC /I 1 v t 2 

/ » 1 1 A «t *1 l Wl x{ 

I hi, hi, x 2 hi, x\ 


X = 

(nx(fc+1)) 


ljlj 

• h 2 X% 


* n *+. 1%+ . x X k+l \+, X l+l . . . hxL 


(38.14 


where 


fi . observations are taken at the point x^ i — 1 } 2,.., } ft-fl and E n- = n. 

t l 1 


Thus the dispersion matrix of 6 is 

V(0) = a 2 (X , X) -1 = <j 2 (Z'NZ) -1 


(38.142) 


where 


Z 

((fc+l)x(fc+l)l 


1 1 ... 1 > 
X j $2 • * ■ 1 

x\ x\ ... xt+1 


x\ xl • •• ffjfc+l 


(38.143) 


N _f\ °V ‘ 38 - i44) 

((*+ 1)X(M-D) l n J 

. n observation points and ( A+ D P^- 

efore see that the effect of'having ( atric es. Hence 

s to make XX a product of “““ “ , z| „ | K\ I *1*. ^ 

? W I.V( 9 ) I- t0 be roinimi^d’ (38.1+5) 
he generalized variance is es> First, 

„* l: n ^o V ;ml 7 .ation in two st g 




•*) 


statistics 

rT -) THEOR y 0 .. _ v are all equal, wh atev 

™ E AD T f or^ce of t 6 £*£■“ ^ ^ ^ thC % 

I £|* mV b „ verify tliat 

38.87 Now *0 ^ |* ( * 

_ _ (3a, 46) | 

so that I = i5/ * * product of squares, that it Can j 

„ form of (3SM(>)> a £[ £ t0 the ends of the interval. 

• hvious at once from th e observation p° for a larger interval 

Ifl ® ° inC reased by moving the x always be g ver the largest possible ' 

rr—s is to be mad, . - - ^ , i 


, f "observations is to be made. smaller values of k. For k = ], 

solve 138.146) ad l"-"" for the sro , , s 07 i, as already confirmed j 

-*? *; locate *■ wi,h 3(1 = “ d 
(3!U46) ’ ^- 4 " • 1 

. . . , att _o and h. must Obviously be the other observat.cn pot„ t , { 

bis is maximized at x t - v, ^ 

7. Z;St/-'3S - «- v”™ ” “ p “""“ ” i 

'" w ;- 4 - 

rh is maximized when x 3 = j> . , 

[n the quartic case, symmetry locates x 3 at zero, and we require only # 4 and x 2 - 

(46) becomes 




\Z l 2 = 16x1(1 -x|) 4 , 

zed when x\ = f. 

5 e results, and the next two, which are as 
"" A in the following table: 


many as are needed in practice, 


are 


Degree of 
polynomial, k 

Observation points in (-1, +1) at which 
j n/(k + l) observations should be made 

1 

1 ±1 

2 

±1, 0 

3 

A 

±1, ±0-4472 

4 

c 

±1, ±0-6547, 0 

j 

6 

±1, ±0-7651, ±0-2852 
±1, ±0-8302, ±0-4689, 0 


(38.147) \ 


h 









DESIGN OF EXPERIMENTS 
U „A (1958) showed that the optimum ob ' 

H° ' in terms of thp T ~-. 


38.8’ H ° e xp« ssible in terms 0f the Le g e "Se° poly P°1m 5 which . 

(^nr^he * oSrt e 


' G uest ^ va riance of the fitted polynomial 1 at ° pt !; ma %, the in 

[ 3 !'le itt^ inl g optimum values; he showed that the optS P ° mt in the intervT Za ^ n 


Ji Of the derivative 01 me «tn order polyno^’™^givenexp,^ 

| b y the* d b eeri considered in a paper by K. Smith (19m i" 16 criteri °n of 

V 147) - ThiS WaS ap P arent tf the &ot designS?.** ^hted 
r iral ueS : • -11 more surprising- that tho - 8 ^ DleiIi to be snlt»^ 



I oP° n i ies (38.1 4/ h / uie nrst design problem + 1 calc ulated 

) the ! a '“S it is «“ the m0re 7 p —gthat the paper was more m W f' S °' Ved in 
detail a" Smith’s paper also contains a series of charts com„ , 60ttei1for 

fo^ y* a d polynomial throughout the interval when the observation! ,ar !“ ce 
of th e fi ■ „ ro method; (b) by the method of uniform spacing of oho s made (a) 
b Ce°r P in the centre of the interval but much worse at the extrem™‘°and 
,S Ihod (b) with an additional group of observations at each end of the inierval, wl c h 
» e *°i\he worst effects of purely uniform spacing. The advantage of method ll) 
'f urse is that it does not presuppose any knowledge of k, and enables the experil 
of C ° t o investigate its value from the observations, whereas the optimum method 
T llocation cannot be used to investigate a higher value of /e—this is precisely the 
° f -t which we made in Example 28.4 for the linear case. It seems wise, in any case, 
P oint i e va ] ues in (38.147) corresponding to the highest value of k which the experi- 

"enter would be willing to consider. 


K Smith (1918) goes on to consider the effect of heteroscedasticity of errors on the 
timum allocation. Hoel (1958) considers some special cases of correlated observations. 


ft? 

W: 


I .Q po Other criteria of optimality have also been used. Hoel and Levine (1964) 

' 3 id’er the allocation of observations in polynomial regression to minimize the variance 

consider the al oea - fi d po i n t outside the interval of observation (-1, +1), 

of the fitted polynomial at a specme p minimizes the maximum variance 

and it transpires that this optimum aUo “ 1 “ on “by the value «. Gaylor and 
over an interval ( —1> *) r a certain con variance, and an average variance, 

Sweeny (1965) consider minimizing q{ observ ation) for the linear case 

over any interval (arbitrarily rented to th observation points in the 

only. H. A. David and Arens ( 1959 ) “""o" or maximum <&?**?* 
linear ease to minimize expected mean-squ^ since the pos^ 

error, the latter differing from ~Jg^ be 2 

is allowed that the true . n959) not confined to P w of criteria 

general paper by Kiefer and Wo ow ^ mum allocations using a tQ 0 h ta in 

uses game-theoretic results to C0I ^P^ ^ u oe ^ (1965 a) appd es t eSC m tr j c regressions, 
(see also the summary in Kiefer (1 ))• polynomial and trlg0 ^ w hich minimize 

optimum allocations for two-dimensio polynomial regl ’ eS , ^ ing immediate y 

Hoel (1965b) finds the designs m 

the variance of the fitted value foi x . are made, an a tra p 0 lation m 
between two intervals in which observantly ^ ^ for ordl nary 
designs in bivariate polynomial tegres 
bivariate case. 





162 


eD theory OE st A — 
the advanced 

EXF.R CIS ,.q ggj for the SS attributabj 

expt«s ions ( 

. 0 f the two exp 

„. 38 . 25 , verify ^„ divi de the treatment parang 

38 -‘ *' differences m » shoW that jf W>■* of different groups, e, c(| J 

*” ^3^By 6 enera ^ z * I) pd 1 req^^ D ^^ )0 ^ ,I tmente n ftrnti'a^ V ^ : ^smaUer Uod^ eXPenme ' >t f 

££sz -* <—■ 

mag be resolved m m ake (he van . 


(Tocher, 1952) 

J ekmus<» n '“;‘""asetotind'P“- 

mW be r “° Ve bl ct experiment is I 

« design withinciden * ! 

(Tocher, 1952) 

matrix given by &.*»■ 


^ , (384M1) for random hiochs designs. 

38 , Verify .be -*“>* f °”“ g ^ ^ ^ ^ of bIocks classified in a two-way | 
taW f L V —te m AV table 'as a. (38.39). 


38.5 Verify the totmunwu^.^ ^ ^ 

r:r e :. - of 3,33, show that the AV tah.e for the L afin square design 

in 38.35 is: _- / 

---- 


is: 

Source of variation 

i SS 

1 D.fr. 

Treatment differences 
Rows 

Columns | 

Residual 

| T T/t-G 2 /t 2 

R' R/t-G 2 /t 2 
| CC/t-G 2 /t* 

y'y-^'T + R'R + C' C)/t 
+ 2G 2 /t 2 

t -1 
t -1 
t-1 

(«-l)(«-2) 

General mean 

G 2 /t 2 

1 

Total 

y'y 

t 2 


\ 

■i 


orthogona/ 11 38 ' 39 ’ ^ ^ n<> m ° re ^ ( ' _1) Ladn squares of °rdcr * can be mutually 


38.8 Verify that the inverse of (38.79) is 

!i-. = D-.-D-t( w i, 1 )/ , 1 _ + i'?;^; -t'D-i, \/ 1 ;\ 

d . , l-w'D-.w!7 + .; D :r w j K>-‘/a, 

: determinant of the 2 x 9 ww..;,, „ / \ W / 


where A is the determinant of the 2x2 mat ^ ' 1 + 1 * D '"'W ' 

0 s or any D. nx a °ve. Hence show through (38.23) that (38.85) 

(Tocher, 1952) 


38.9 Show that if D is det • . (Tocher, 1952) 

■is-. ®„... a „,™«a ... 

1C * here are block effects to eliminate), 































(3f 8 f > ceS 

tb‘ s 


/<lft o 5 ), the BIB design equation. If th _ , 

^e'Udomized blocks design (cf. Exercise *8.^ efficie i*y 

rterposed complete set of orthogonal Latin (T ° cher . 1952) 

h of the T 2 cells of the arrangement has {T+TT S of order r ( 

:h nd its position in each of the (T-1) « -LiIk* .^rences” LH 08.76) 
l > a .„n take T distinct values. Show tV, a A bets ’ forming tv. entlf ym g i ts 


for * c ol^“ ; ce can take i v<uu«*. Show that if ea ch r mi 

<’«£> 'tot*' ceUs ,0 one a set cb ' we 

& •* « - n * - nr+1), » - T, ,. T+li ™' d ; 

9 . of (38.76), the resulting BIB rW™ • 


,^t»“* ber 


Block n uC 

.rreat^ 8 


o ft = i, r ^ t + \ i _ Wliri 

the case of (38.76), the resulting BIB design is/ * 

nx[H .ill 


9 13 1 2 3 4 1 2 3 ' 4 o ,'T'+"- 

10 14 6 7 8 6 5 8 7 7 s 3 /l V *' 2 3 4 

11 15 9 10 11 12 11 12 9 10 12 11 in oLn' 1 6 

n 16 13 1 5 l! }Sji» « » 

3WS Columns Roman letters Greek letters 


Obt^ 1 


,ed from 


Rows Columns Roman letters Greek letters 


. Tn Exercise 38.10, show that if we augment each of the T blocks obtained from a 

38.11 f rence by a further treatment, where a different such treatment is used for each 

particular rete finally add a further single block containing all the (T + l) further treat- 

distinct telex : > metric BIB design with i = 6 = T* + T + l, k = r = T + l, A = 1. 
m ents, wre oD d design for the design derived from (38.76) in Exercise 38.10. 

Verify this augme 38.39) show that satisfaction of (38.93-5) is not sufficient 

ttv considering the case * v 

^he° existence of a BIB desrgn. 

„ • 10 in s how that the dual design obtained by putting t - b, b - t, k - r 

38.12 fc E T ^d hence c arl „o t be a BIB design. In Excise 38.11, show .ha, the 
md/ = k has A < n ^ esign derived £rom (38.76) is: 


dual of the augm 

Block number 1 


Verify that this is a BIB design. 

38.13 Show from (38.93) that for 

,/( " (38 93-4) to show that 

'1 and with ( = rM = -as above (38.98) use (389 > 

i Hc-\) = 

. . ‘ Heje establish ”(38.98). 

where 7 is a positive integei. W 


3 1 

4 5 

1 I 1 

! 1 

l! 2 

7 

: 8 1 5 

111 

I1210| 

115 

16 15 i 

1 19 

20 20 

BIB design. 


2 3 i a! 31 31 4U, 4,^115,^13^17 


3 7 11; 15'19 





„ cTATI sTlCS 

„ ,,-HEOB y ° F ctimator (38.118) is identic, 

ADVANCED IW ,he use of the muted mo de 

THE AU ra „dom^ n b 38.48). *> matrix of the estimator/ 

in particular being 


,, 8 I, ,o Show that 

38.15 In 38-5 ? . use ( EtB) = " T 


V(B) = « 1+W<, ! I ": se dly ft'» m thC bI ° Ck *° talS ^ 
« l a,meet parameters unb.assedly 

s0 tot we may estimate the «— £ . 

with 


i, - (nn'r lnB| 

Show that *r reduces to (38.29) when p -*■ «■ (cf. Tocher, 1952) 


3846 Show that T, of Exercise 38.15 coincides with the MV estimator at (38.118) if and 

only if / 

T'/diagfrJ-lj^ltuf 


.-I ‘/t- 


1+kp 


- )nB l = G. 




Verify that the equality holds when the design equation is of the BIB form (38.95). 

(Sprott (1956) obtained a similar result, but 
instead of Tj used an estimator combining T 
and t b in Exercise 38.15 with weights 
reciprocal to the variances of their elements.) 


38.17 In a BIB design the observations within each block are replaced by their ranks 
■.*• If Ti 15 ,he total ( of ,hc ranks th “ allotted to the ith treatment; and 

St-.^(Tt-T.ymZTj-tlirik + lW, 
with maximum value S m:i , = n f {n _ i\ /n , 

r ■*/«-»- 1V12, Show by extending 37.40-1 and Exercise 37.14 that 


Em = i*±2) 
Kt+\)' 


var W = - 2 ity) 2 / X-V 
eXaC ‘ ly ’ and ,ha >. approximately ^ t( ‘ + ^ ’ + ' 


k~\ 


\ 


P(<+1) ) 
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distribution with d.fr. 
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Vi = t~\~i 
r 


fi dCSign ' 


^2 = ('-IK 

of Exercise 37.14 when k = t, A == r, and the BIB bernm 

ecomes a randomized 

(Durbin, 1951) 


» * ' ± j 

Vnt any BIB design re f lva £J e j nt ° r , r ® plicates ° f t treatments, show that the 
38.18 F tments common to two blocks m different repl,cates has mean equal to k*/, 

»«o> bet0t about this mean proportional to , (b-t-r + l). Hence show that (38.98) holds, 

,uaO fsq d onlv if the e 1 uahty holds ln <- 38 - 98 ). there are exactly the same number fie 
s d that if 3X1(1 c ommon to any two blocks in different replicates. 


(Bose, 1942) 


Silfl 1 -r „ _ . 

that ^pnts common 

0 ) of treats 

, observation points given for k = 5 and k = 6 in the table (38.147). 
tn (K. Smith, 1918) 


38.19 


Verify 




. aa * U its smallest prime factor, and k^p, show that a BIB may always be 
38.20 If t 1S oc l .y • paual-differences method. Label the treatments 0, 1, 2, ... , t- 1 

ons.ruc.ed by the ~blocks claming the treatments [0,1, 2.*-!]; [0 2 4, 

, n d construct H £ t) -1)], where every number in the blocks is 

'(ft—1)]; • • •’ t °*.^ 1 j ’From each initial block form a new block by adding to each 

alculated as a residue (mod t). From^eachJ. ^ t We thus obtatn 

,f its treatments an integer r;! th*^ d i residue c i ass mod t) for each inmal block. 

, set of t blocks (one for eachmemb = p(t-l) and 2 = 

Show that the resulting BIB has b - *«(* - (Gassner, 1965) 
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SAMPLE 


“ aGNS 


ans « f ^P°^Tof random sampl ‘^®g"procedure requi^ 

°. f ° ur f SSty theory to a ^Se samples; it is by no 

:* - *“& ("«*”£*,™..»«I**., .hi, 

to distinguish betwee between successive y dependence. 

necessarily involves ment , w hich remove f shall be) concerned 

non, and samplmIf* P vo!uraes , we “ A chap ter, however, We 

For almostthe whole oi me ^ ^ and the following ^ practi 

ordy wrth^srmph rand^ ^ ^ forms of rand om sampling. ^ h ^ with a fi„i te 
IT,here is a problem of members’ Even in sampling with 

ssax 

“Strc: , :ES;,'=»i«,..». 

ItotettXampksurvey theory, as this branch of the subject has come to 
be called, is almost entirely in the estimation of means (or, equivalently, totals) of the 
variables being studied. The theory which we shall study is nevertheless more general 
than this, since the mean of any function of a variable (e.g. of its square) may be treated 
by the same method. We shall study the estimation of variances and other constants 
of the population only in so far as this is necessary to throw light upon our central 
concern, the estimation of means. Results for proportions are always derivable from 
those for means by specializing the variable to take values 0 or 1 only 
We are thus entering a rather narrow area of statistical k„I *•«. • 

which has been intensively cultivated, and this on frounds of t 7 ’ . • * "" ^ 

rather than of its mathematical attractiveness The I • * P ? C !‘ Cal lm P°rtance 
recent years been summarized and supplemented h J 0urna literature has in 

Coclrran (1963), Hansen * a,. (1953)^Xs notabl y those of 

The last-mentioned book 

M- N. Murthy"(f963 U f r ‘ v UP ‘° ^ “ new editions shoe its °T S . UrVey samplin S 
We Shall have to e 7 recen ‘ theoretical devM H Publication in 1949. 
ffld our aim l C ° nfine our d «ussion to IZ deVel ° pments - 
o»not hope that all thel-esX ‘ hem b the eontex^or^' aSPeCtS ° f Sam P le surve y s ’ 
to th'eth? f ™ PMta "ce w7 e ,f ner , a ‘ Sta ‘ istical **>*■ * 
erea ‘ cu mbrousness of the sub^f y displa y ed . but we shall 


The 






i. 










SAMPLE SURVEY THEORY: DESIGNS 
sa0)P U»g ^«h equal probabilities without tepla 

We wish to estimate the mean fiW of a variable y in a fi„i, 

* 39,2 fro® 4 samp e ° f “ members drawn at random Ibh P ° pulation with 
kT oie fl ' b “,'probabBities of selection. It is perhaps not quite ^7“* re P lace!M ”t, 
U <anj knowledge of the form of the population) The M v ‘!“ S that 0» 

mean * We now use the Least Squares theory of cfe m &£ 


the 


samP 


str# C 0 lsid'er the model 


Con! 


y — \fx + c 

(nxl) (nxl)(lxl) (nxl) 


whe re 


1 is a 


vector of units. The errors e t are the deviations of the observations from 

__an tna'IT Offt n A+ 1 1 a 'Jill 


—j .u-: ., ucviauons or the observations from 

I population mean, and they are not uncorrelated, because the drawings are not 
"dependent. By the symmetry of the situation, the covariance, say Ph , between any 
A of observations in the sample is the same. It is this symmetry which leads us to 
expect the sample mean to be the best estimator of / u. The dispersion matrix of the 

errors is / 

P « • • i 


V(e) = A 




= /i 2 V. 


(39.2) 


1 Tty 19.17 and Exercise 19.5, the MV unbiassed linear estimator of p is the LS estimator 

/l = (l'V- 1 l)- 1 l'V- 1 y (39.3) 

V(/i) = ^(TV-U)- 1 . (39-4) 


with variance 


To use (39.3-4), we must evaluate V' 1 . As may be verified by multiplication with V, 
this is 


' {\+{n-2)p} -p • 

-P 


7 \ 


V-! = 


(l-p){l + (n- 1 )p}| 


_p ... ~P {l4-(«-2)p}/ 


(39.5) 


Hence 


X'V- 1 = {l + (w-l)p} ir ’ 

X'y-il = + 


/ 0 £ t he population in this 

<*>For convenience, we write , (without suffix) for *e nwan^ ^ cent ral moments 
Wj <he next chapter only. The suffixed moments AO . [n these chapters on y) 

^ u sual. The corresponding sample moments are 


M 





advance 


v oF s r^ sTtCS 

he ok* 0 


(39.6) 
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an 1 


, uate the« rre Swhatever? ^ (/J) reduces to Z e ro , 
we have not ha f t0 gg Ct ion remail ? n I that when n w hen w = jy 

a pplieat l0n » : s s ampl ecl ‘ values is m r 

P ,u P whole popnlat 10 . 0 f sample ep i a cement, 

size. Thus, q p = -^4 




tth 


(39.7) 

^ |j + (#" delayed this delib er . 

1 T^ehave^ j n our present 


n d (39.7) becomes 


V(fi) 


h 

n N -1 


(39.9) 


t mean and their unbiassed estima . ^ distribution 

•.TSJSJStty" r "‘”lE=3^» 

39.3 A y a 7\ ; aprived in Example 12.8 using , j population A- 

”e found it algebraically c 0 ”^”* "er we Sdo the same. In fact, we shall 

istics, k, and and throughout ms chapter 
the population “ variance as 


and the sample “ variance ” as 

„ 7 w 

S 2 = k 2 = -:W 2 ; 

«—1 

we retain the name “ second moment ” for // 2 and m 2 . With this notation, we rewrite 
the results of Example 12.8 (which include (39.9)) as< # ) 
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all from (12.109) that ' S1GNS 

" m - K„ 




V 9 

„ may substitute s 2 for a 2 and ft, f or k \ m m 

*£•" 01 “■” ib “" «• -LJ 

■ -(U> 1 

«--,)•! - >(1->)}. >»->’> 

biassed estimator of the fourth moment is *\\oh*Ur 

Th ;;tar in the X, We require an unbia ^tZZ\T7-Tt^ 
18 ^ g ives at once the result r ot a _ *C 2 . Exercise 

* / / n T \ y T 


k „_(N-n)(Nn~n~N-\) , 
n(n l)A'"(.VTn 

w-ti ,:— 

(«-l)(W+l) +1 


= K- 
/v 2> 


which reduces to 

pf/w-iN/' iV+l 'Ws (N-n)(Nn-n- iV—1) , ) 
E \U + l/U-y 2 «(n + l)JV(iV-l) 


(39.13) 


V, V / > 9 ' 9 \ * / 

Substitution of the random variable in braces in (39.13) for a 4 , and ft 4 for £ 4 , in the last 
equation of (39-10) gives an unbiassed estimator of the fourth moment of m. 

39 4 There is nothing to prevent us, in any given case, from estimating the first 
four moments of m as indicated in 39.3, and then fitting a Pearson distribution to 
obtain an estimate of the sampling distribution; ^ here is absohitely andogous 

to that of Maximum Likelihood estimation in 18.2#, Vol. 2. However, here as the e 
I nrncess of fitting a small-sample distribution by moments is rarely carried out. 
Phis' is less due to laziness than to the fortunate fact that thei Central 
makes the labour unnecessary for the 

we shall not prove the limiting norma i y o , cannot simply let n tend to 

in connexion with the nature of the limiting proce • established a Central 

infinity, since it cannot exceed N. Thus ^ill s^ect only to n/N remaining 
Limit result in this case, allowed both n and skewness and kurtosis 

bounded away from 1. It is easy to verify horn ^ Hijek (i960) 

coefficients of m tend to the normal va ues un limiting normality of m. 
gives a necessary and sufficient condition « r ,‘^'“'“described in ^V 
We may thulapply the standard ««« values 

to the distribution of m, and carry ou way It is only randonl 

mean, or set confidence limits for /t, m effectively pr° cee< * as 

be large enough. If n/N is small, we may elf 

sampling. 




, s?^ lsrlCS 


•«Hivid uals reCOg °f sufficiency• A sufficient 

: * i the concept o ararn eter. Now, i n 


; able 


A suffice 


irvey >ry ’ j . the concep 1 , e parameter, -lnow, lri 

. c y ill samp le 8 develops , c oncet mng ] at i 0 n distribution, b ut 

•f:?l !««. V ^X» i" *: of ** the lV-dimens ion ‘ 

vector Of the values h ^ wlthou t re P‘ vec tor. An J ™ , population members 
(whether * e ec ^ e i em ents of the P 31 - 3 same set o fwhether these d members 

or fewer of tta>**£, wh ich »»« "“£ er , irre spect.ve of ca ]led equiva , m 


r r Tlip set ot an eacn cutsets ox a sumcierit 

rS— »rsLTiw#”"- ■ "T.'.'L.ri. *• '•««»«■* 


equivalent > j another sufficieu 1 F aA nation to wn xcxl cvt/A -X WLU ^r 

partition can be merged and ano he a sufficlc nt P 3 ^ 0 ” “ [d m 

a P smaller sufficient P 3 ™«on.need by such a merging process, tt wou 

we induce a partmon of the set o 11 p ;on thus induced by t is sufficient, f 

samples with a particular value of t. it tne p subsets 0 f equivalent samples, 

itself is a sufficient statistic, since itsva ue h will te ll u s nothing further 

The conditional distribution of any othel statistic, giv 

luScZTtatistics in Chapters 17 and 23 now applies. In 
particular, the Rao-BIackwdl result of 17.35 states that, given any unbiassed estimator 
of a function of the parameter, we can improve upon it by using instead its conditional 
expectation given a sufficient statistic. 

Simple as this result is, it has some unexpected consequences. Consider the simplest 
case of random sampling with replacement from a finite population of N members, to 
estimate the population mean. The intuitive estimator is the sample mean, m. How¬ 
ever, this can be improved upon, for in general the n sample members will include some 
individuals selected more than once, and, as we have seen, it is the d^n distinct indi- 
viduab in the sample which form the basis for equivalent samples. Thus the vector 
statistic f, consisting of the values yM.yV *) . v (rf) attached tn iWm A‘ s j- •, , 

in the sample, is a sufficient statist c 'l A V , C dlStmCt mdlvlduals 

evidently the mroOheXS ° f * * is 

smaller variance than m. R a i and Kham' nocro\ e> sa ^ and this will have 

explicitly—see Exercise 39.1 Vhere as ™ h' ^ ^ reduCtion in variance 
proved by the Rao-Blackwell method have raL ’ T P ' e SUrVe y es t™ators im- 

ofestimat S ion e tfoT I \ m J Sampling with •'cphcemen^it' 15 var,ances to evaluate, 
rather than all theJ^r elected individuals a' W ', , a wa y s improve precision 

’ * S a ' Wa y s a " u PPer bound £ ?“ tor is ' howeTCr ' 

bat of the former estimator, 
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its e ^ k ( 1961 a, b, 1962a, b, 1964a, b, c ) has r • , 
of tfeaPP Uca,i ° n ° f thC Ra °- Bkckwe » ”«hod to 
theofy' U Sam P^ e survey 

.^pling without replacement with unequal probabilities 

S ro f, We now generalize random sampling Wlt | 10 , lt „ , 

babilitt es of selection to differ between individuals andfr“ , T t b y *wing the 
P ,0 r b % be the probability that the ith individual ( wi , h J^ 0 "! draWin « t0 d ™ing. 

het(-)P‘ g t Value Ji) selected at the rth 

drawing, £ M* = 15 ‘ fr0m 1 to N *nd * from ! to Now let 


m 


Certain 


n 


n 


ij 


n 

0 )Pi) 

r=l 

n 

= ss 


(r)Pi (s)pj, 

r, s^l 



r^s 

lh . later probability on the right of (39.15) being taken as conditional upon the 
lj er event being realized. From their definitions, jt i is the overall probability that 
C is selected for the sample of size n, and n ,„ is the joint probability that both y t 
I*, (i^j) are selected for the sample. Clearly, the complete set of {t) p u which we 

3I1 11 the selection scheme, determines the n i} and tv but the same RP t nf w .. mow 
^ 4 m. c^l^ntinn Rr.Vipmp.R_ Tt is virii3 


c-/ - ' ’ ~ 

11 tfte stl tLw> ---- - “i » lllc same set of n u and re, may be 

c c l ate d w ith different selection schemes. It is usually a difficult matter to find 
aS i° f the P ■ to achieve desired values of and re,, but in selection schemes such 
vaU f S ^ j7 X ercises 39.5-6 the connecting equations can be solved numerically, 
r 11 ;n 963 Wives a recursive method of making the {r) p t equal for every drawing. 
F Equal-probabilities sampling without replacement, which we have already con- 

sidered, has 


n 

= N } 


71 ij 


n{n — 1) 


N{N- 1)' 

In general the tender may verify fmm (39.14-15) that 

■ 1 


(39.16) 


We now follow the notational convention that TUs will not generally “iucide 

v, v v in the order in which they are diaw This means that, 1 e.gv 

with die order of labelling of the population Tn Tj, ; • ^ ir , tere sts of sim ^ k 
Jb in the sample is not y 3 in the P°P blat ^ sym mettic functions of the 
shall retain this notation when we are consideri g 

Val “ eS ' saffl le as a » Wc - and ^ " 

In taking expectations, we consid er 


n 

1 






172 


theorv of statistics 

the. advanced samples , and we suppose thesc 

, triable There <« L P c . . . Vi' By definiti »n, 

* "» ra " d0m 1 end labelled 5„ A. 5 ” ( » } 

to be listed in some o \ 

£ /^,.U ( SProb{54?/'>- 
«K(w) -%,// '- 1 / v 

omhled into iV sets of ( iV " 1\ 

values, corresponding to the 

population can enter the sample. Thus 

17*-^ 

* ”s Prob {*S r } 


nE(m) = S 
1=1 


L r—1 




i j't 


say. Thus, as we should expect, the sample mean 
population mean. We also have 


. = Yiniji = r > ( 39 . 17 ) 

i=1 

not generally unbiassed for the 


is 


; r(w) 


= EUzy<) 


= f(s 

\i=l 

= £^ 2 yfj + y t y}j - r 2 . 


Prob {5,.} 




and therefore 


N 

= Jf, 


V (i y ‘) m »m>n)= X ayf+jg. 

* = 1 

AT *** 


N 


(39.18) 


N 

The first expectation is evaluated exactly as at (39.17), and is equal to 2 In 

i=l 

the second expectation, there are sets of «(«-l) values, which we reassemble in 
AP-I) sets „f (At-2) values, corresponding t0 ^ /AI-2) ways in which each 
of the N(N-1) pairs of individuals in the population can enter the sample. Thus 

*(“**) = ,?, )prob w(.f **),- 2E F- 2_) 

.Vj r ~* Hi- 1 / t, y = i 


j 
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The expectations which we evaluated to obtain no i 17 a 

* ,?M - ,5 »*,), 1 


N 


(39-19) 

i#j 1 

. u r equire no further proof, s i nce the argument which we used ™ ■ 

*h‘ ch an d y,y, are replaced by arbitrary functions “ emalm unchanged 

* b «J mav n° w obtain an unbia3sed linear estimator of „ if * 

. % attached to an individual value ,, in the population wheterTisltted, 
„ must have ’ 

vve niu f n \ 

E \ , s w«y<V = jw, 


V 

an d hence, by (39.19) with g{y,) = w t 


v j N 

£ UiWiJi = [i = — Y, y 

i=i iV i=i 


This must hold identically in /i, so we equate coefficients of y t and find w t = 1/(N^) 
and thus i » v 

^Tt^- (39.20) 

r Ni=l7li v ' 

is the only unbiassed estimator of this form. Further, the variance of (l is simply 

(39.18) with replacing y,. Thus 

^ it y \ 


IV s V(fi) 


s Iz* 

i—1 


u,?+ ss (5!CS5i w 

i,j = l TtiTtj 

j 


(39.21) 


39 8 We can obtain an unbiassed estimator of (39.21) by use of (39.19). By 
inspection, such an estimator is ^ (39.22) 

N 2 P, (/*) = ,y Ct-y 

proposed by Horvitz and Thompson (> 95 ^’ * b “ ^ p921) in the identical form 

in 39.6. On the other hand, we may use (39.1b) (39 .23) 

jV 2 VUi) = I TS (rt,r'i -Jt *')yj t . 31J 
V i, j-. 1 

'Thus, by (39.19) a second unbiassed (3W+) 

/i Prundy (1953). 

which was proposed by Yates an 


m- 


gTA' rlSrrlCS 
rH EOR Y 0F ,39 23 ) and its estimator 

* AnVANCE D the variance t - t neceS sarily So 

THE APVAi^ d to that *. Vb-t approxilnate 

174 u nse each ^ P r ° P /g xer cise 39- ora ctice, w ith y. Fellegi 

4jg ^ values of the *" 

for the alte . f we have 1 s method ator . j estimated variance 

to this situa ^ Rao 0^6 ^ ar jance of the es va riance an ... t j es w jth replace- 

mem in W^h .he who.e samP rf sam plmg variance ts 

, „Hich of these mternafveest^ ^ vaues 

I. is not at first clear «toh (he estimator (39-22) ^ negat i V e va ues 

preferable. It * e f d /. 8 “°™ s0 obvious that (39.24) can ^ jn tw0 selec ton 

/ r f Exercise 39.3), but i . « It is neverth # 5-6 give details— 

Lfeercise 39.4; shows that this a er negative-Exercisesi • ]y pre fer abl e 

schemes used ‘"/^‘^Thin seems no doubt thaand for an 

and this is not so for 3922. mor e likely to take negatvev # 

to (39.22), since he latter s, 8 y . itive expectation) this P 
estimator of variance (with necessarily p _ 

sampling variance. . . f the sam pling variance (39.21) which is 

It seems likely that there is no estimator of P & th ; s has not been 

non-negative definite for all selectton schemes, but so far as we 

demonstrated. 

39 9 However, by adopting a different linear estimator from (39.20), we can put 
ourselves in the potion of always having a non-negative estimator of the samphng 

variance of our estimator. In order to do this, we must no longer confine ourselves 

11 

to linear functions of the form 2 Wiji discussed when defining fx at (39.20), for we saw 

1=1 

there that jx is the unique such linear function which is unbiassed. 

In constructing linear estimators of the population mean from a set of sample 
values, the coefficients attached to each sample value may be made to depend on! 

W ^individual population member whose selection for the sample yields that 

(b) the drawing at which the sample value was selected; 

1 uu ^r;'f ( :) memberS Se,ected for * he -P>e, rather than the 

® thus seven geaerTctesMof Unea"^^^^ 0 ’ U f ‘ hree ° f (a) ’ ( c )> and there 

showS we C cannot a fi d d a “ aly “ d by God fmbe°(i955 T^r a “ ^ °* herS ' 

a prooosiu fi c d 3 “ ,nimu m variance unhi, ’ 1 , 965 ). and b y Koop (1963), who 
knowledge of the poDm f 1 ‘ S intuit ively plausible friT e f tlmator in tb 's most general 
with rero the “^deration that? given 

the mselves ' b 1 ‘hat this est i mator y a linear unbiassed estimator 

.h. populalion 

°ptimurn linear ect* ' d > as we see nf Exercise 
estimator here is due to the fact 


I 


* 

) 
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Ire ?! fawso far considered only coefficients depend! P ab '' meS SeleCt ' 0n 
Jm estimator whose coefficients depend on all £" 0 ^ 

<o jo Let y (f ) now denote the individual selected at a , 
he the conditional probability of its selection at that drawing?*££"** ** ft, 
b 1 cted previously, and define g > § lven that it has not been 

u —1 


*1 = yw/Pw* *« ~ ^ ?w+:WA«)> 

Each of the nz u is unbiassed for Np, since 


11 


= 2 \ 

°> • • . , n. 


(39.25) 


by (39.19), 


and for u> 2, 

£(*») = E 


®(*i) = S y ( 

1=1 




~u—1 


u—1 


£ y ir) +\Np- £ y (l0 


r=l 


r=l 


* > >’(«—l)j 
= E{Np) = N/x. 


) 


(39.26) 


(39.27) 


Thus any linear function 

n n 

z = £ c u z u , £ c u = 1 

1 t = l t( = l 

will be unbiassed for Np. The most symmetrical such function is the mean 

S = 1 I *„ = H s &>+ S*Sy w l 
= 1 n{u=lp(u) «=2r=l J 

(39.27) does not reduce to Nm when probabilities of selection are equal—cf. Exercise 
39.10. The variance of z is 

V(z) = - { £ var z u + ££ cov (*„,*„)j. ( 39<28 ) 

v J n 1 b«=1 ll ’u?v 

By evaluating the covariance of * and «. in tL’stages the first for ffi£*** 
second allowing * to vary, it follows at once that cov(a„a,) - 0, 

E (*•*■> = £W S 9 = 2ffi. "in general, var *„ is cumber- 
Thus we only have to evaluate thl = ™ rl f lCeS1 . ' t0 estimate F(«), for 
some to evaluate (see Exercise 39.9), but it is easy 

V(S) = E(S*)-tf V_ . _ 2 _ z z for any If 

SO that, using (39.29), an unbiassed we obtain the estimator 

we average this estimator over all 2 n \ n > 

- -n _- S£ Z^Zy} 

^(*) = * 


(39.30) 


which is identical with 


s (^-«) 8?0 - 


(39.31) 
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' 8 ; eleCted at kast twice. 7 39 ° r 16 r e f ° r tHe Sam P le > and 
** 1 ^ *1 ^ »(«■ 


% that' 

n i = «• 


I 


THEORY 

, n d A-statist* 0 samp]ing the T,- (with 
, »- where s' 10 ~ , problem 0 , ca i cu lating their 

(39.31) is of the ° ra * . therefore redul )f t0 samphnS * ie . “ y we re sampled with 

=a£S r $0 :s " “'* 

““ probabilities a» d wt 20) and (39.27) to use. Little 

given as E«— 39 '’ 2 ' which „f the estimators reports tw0 samphng 

39 .U Wehaveto d ee>.eeffic>en c ‘es, “ strongly. obtained by 

is known in genera' ^ ^ favour (39.Z7) < ^ ^ (39 . 2 7) can D ^ y 

eXP Hiwe«r. “estimator ^proving an .^"^“dence on the order of 

niving the Rao-Blackwel t0 emphasize it P observed sample 

X (3«7) (which « now rd^ afpossible orders m wh.c^ probabilities 

drawing the sample) average these n ! values / t1 * mator z s - The esti- 

weights thus obtabing - -pro™ w obtain a „ improved 
mator of variance at (39.31) can b e treat J d ^j. Murthy (1957), are direct con- 

estimator of mo)- These r “ ultS ’ , d of , 7 35 a „d are given in Exercises 39.30-1. 
sequences of the Rao-Blackwell £0 ss a non-negative estimator of 

The improved estimator v, has no Murthy showed that this is so for 

; a T tl" MS)^^ifog’experiments already mentioned, M. N. 
ml, m confirmed {ha! * is more efficient than 4, and that an unbiassed esti- 
mator of its sampling variance fluctuates less than (39.31) does for z (s) . Finally, the 
improved estimator of F(z (s) ) also achieved worthwhile gains in efficiency over (39.31). 

There are thus strong theoretical reasons for using the improved estimator z s . 
However, computational problems become formidable as n increases, since n ! different 
sample orderings have to be considered, and even z (s) as originally defined at (39.27) 
requires the computation of n conditional probabilities, which may require considerable 
labour when population size TV and sample size n are large. When the selection scheme 

n S « : a S mEx e r c ise39.3.; b yk £ epin g pr°babi,ities proportional at all drawings 
a, takes a ampler form, but still seems formidable to compute. ara ™ngs, 

U 7n P :r Ui "' K “ Samp " n S whh replacement 

’ d ° w re 9 UIre the further definition d ” 1 and ^ by 

Jl 


j*=i J 




\ 


| 

i 


i). 


(39.32) 
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id> 


. . ..a . . . -o lVjiNS 

Ids for » n V 1 mthout rest " ctl0 «. Similarly (J ^ 


*f£x -&**<*). 

e{ss = SS 

IT] 


is r 


j / being unrestricted in the final double summation in (39 33 ^ T „ 
i an* {«.. may now have its suffixes equal, but the t?rm - In (39.18) and 
( 39 ‘ 2 1 st still have different suffixes, as the reader should* ** d ° Uble summ a- 

%%ii* •< “ *»<•«» ■'" ” - 

(t« the particular case when the probabihties of selection are the " S 
. ‘^hat we may write them p t without a prefix, the 

*e * = *#< and ” if = n( - n ~ l ' >PiP >' and the estimator (39.20) becomes” ^ ^ 


rt_i. yh 

M JVn ..ip/ 


ff hUe (39.23) becomes 

s£wM' 

and using (39-33), the unbiassed estimator of (39.35) is 



(39.35) 


N*K(f 1 ) = 


SS 111-11 


Using (2.27), Vol. 1, this is 

WtM = 


- I - - -- 

2w 2 (n-l)i,j=i \pi pj 


i J /y<_I £ y_i\ _ *V> 


(39.36) 


w(ra — 1) £=i \jt>i ni=\pj n 

where s 2 is the second sample ^-statistic, defined as in 39.3, of yd Pi- The simplicity 
of the result (39.36) springs from the fact that the yd{ N Pi) are unbiassed estimators 
t ^ uncorrelated became'sampling is with replacement, so that Exercise 39.12 apphes 

here. 


Sample designs: stratification find a selection scheme 

39.13 We saw in 39.8 that we cannot hope n pr ^ minimum of zer0 . 

which will reduce the sampling variance (39-^1 w al (though possibly 

As indicated there, however, we may have aval a ot ^ er var iables correlated 

rather imprecise) information about the varia e y, random sampling- The 

with it, which enables us to improve consi era ^ t ^ e % \ is to reduce estimation 

aim of a sample design (i.e. a choice of the ?% an statement to take intoaccoun 

v.„-.JL,. rr.,t«. we shall modify this su. dealmgwith 


- vaiymg cosis oi umciciit —r*-o a ^ some comp*- , ,1 use . It 

everal variables simultaneously, we can estimators c i aS ses 

'ill be effective in producing small vananc ^ consider ation of vanous 

s this need for compromise which len s po - ance _reduction in m 

f selection schemes which have the aim ot 
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D thbobT of fixed and only the 

. TH d E 

“* 

;lS-f * h iS n0W ^Vm * ** - *: 

q y • • lot possible to sa >^ W ^_ |ue s of the ;n> we • 39.6 that in ordinary 

:T,» .„ _ x-ij 

r.F *■ sw^i 

equal-probabilities samp mg ^ ^ ^ ^ every on e of the %N{ N ! ) te tm S 

for all i>/, whence *"T** ' he ’ varian ce in this case. We are now i n 

• HQ 231 makes a positive contribution to ^ ^ n /N , we are 

aposition ttrsw that if we put every ~ of our estimator from those pai rs 

bound .0 reduce *J «» * "M), » d the sUgh * mCreaSe “ ^ 

), j. But because o ( . *' increasing the corres- 

f #(* -J) t0 *1 will be offset by a decrease m other n n , 

ponding Si-* the 

vl«“f “ ^“rTsholtH ipect 1 a net overall reduction in sampling variance. 
Moreover, since the increased % are only slightly increased, the compensating reduction 
in the other a# need only be small, especially if there are no fewer reduced than increased 

We thus have arrived at a rather imprecise principle for improving sampling variance 

Tlitl — 1) 

with the jtt all equal: increase the from —j'. to — wherever J | i s large, 

and decrease the 7t {j wherever jy { -y f | is small. It will make our principle clearer 
if we realize that when % = y, and y, must be selected by independent processes 
Thus we are investigating a procedure in which the selection scheme is hrolil 
two or more sub-schemes, operated quite independently of each othe p P mt ° 

.0 .be notation of 3,6, we may express this in ten L of Jsetcdon sc£e 


For 




h 


0)Pi 


for 


>0 for Hr^n L , 

0 otherwise; 


Nl<i ^+K wA />0 for„ I<r ; Mi+B2j 

l - 0 otherwise; 


for 


J, *.<«•<*, 


otherwise. 


(39.37) 
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into k corr^o “" inin g ty memK 
corr espondi ng sam mbers . 

1‘ = «. Each sub-population is independently sarnpled P * ° f si *» »„ 

j=si principle now tells us to choose the v. no ^ 
i Tj of different groups are as different ‘as possibWf ° f ‘ he grou P s so that 
^ $ 23) will come from pairs in different groups) while thetl' 

( « as alike as possible (for these are the pairs whose m a° tlnyo “«to«S 

’ population sub-groups, each of which is to be samtZ 
KU lts combined to estimate overall population parameters are “<* ^ 

Is being identified metaphorically with geological layers in rht V h ' grou P s 
I detailed study of stratified sampling, as our selection scheme (3937V^., For 
Shall find it convenient to start afresh with a more direct approach “ ’ 

Stratified random sampling 

39.15 As in 39.14, we suppose that the population has been divided into k strata 

Jc 1 

the Zth of these containing N t individuals, 2 N l = N. We now specialize the general 
scheme (39.37) by supposing that independently within each stratum the sample of 

Jc 

n members, - 2 tt t = n, is selected with equal probabilities without replacement. 

. Such a scheme Is called stratified random sampling without replacement. Because of 
J Z, independence of the selections in the different strata, the theory is very strnght- 
4 £“?We now denote a member of the Mr stratum by and wdl alwaysreserve 

the first suffix / for stratum identification. 

, , n x r 11 i in the Zth stratum. Thus the unbiassed estimator 

Clearly, we have f° r a111 m tne 

(39.20) becomes 

I —, ^ 711 — _ 

Ni 


fc m yu_ = ixiViWi, 


a = A 2 2 —T^J 

^ Ni=i 


the/th stratum, whose true mean is 


(39.38) 

denoted by Hv 1 ' ‘ 


where m t is the sample mean m t e va riance of iS 

Since the m, are independent, the samplmg va 

m .2 .swfv W 

* \r) \f 2 / „ and 


(39.39) 
the Zth stratum 


atelyineachsuatun...^^.^^) 

where we have applied (39.10) scpat ^ varian ce m the 

_ t c ^ir-milarlv write r _ 

«i\ 




variance. If we similarly write ^ ^9) 

gives the unbiassed estimator of 

vm - rd N, »A 


(39.40) 


My 


<#) There is no danger of confusi 
J always identifies a stratum. 


lt htheno»« on 


for moment' 


jf it is reItie 


:m beredthat 
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«rsD T»-- f w J4 which led us to stratifi ed 

THE ADVANCE ^ disc U s s ‘°. n a uniform sampling fraction 

' „,.r investig 3 * 1 ” 1 * • ble samp ,. • 


Wish to mate a11 tlic '^constant for i‘ l ! j ^ 'our in vest ' g ““have variable sampling 

'X »- USth ;rnee^ *°\t**£j71 », - ®f ^ 

(USF). However, I? aoy choice of * which choice be chosen t0 minimize 

(39.38-40) «e must „„w do * shoU ld the »«* although , he 

Es-r>- • r 

£-S St 

°tssz, 

-if* 


E°V 


_1 

iV 2 7 


«i 


2 Ni<*i 

(39,41) 

» the reader may verify by «pa nd “?jf? Mn-n^dve^depends upon the stratum 

variance ill be minimized for choice of the 

iii when this term is zero, i.e. when 

N l o l 


as 


S Ni <?i 
1 _= 0 


«* 


n 


or 


fh 

N< 




ZN^i/n 


(39.42) 


i 

Thus minimum sampling variance is attained when the sampling fraction n t /N in 
each stratum is made proportional to the square root of the population variance in that 
stratum. We call this the minimum variance allocation to strata, and denote the -of* 
mator in this case by fi m . The sample sizes determined in this wav are i, c n~ 
fractional, and in practice the nearest integers to them would be used ‘ 7 

It follows at once that the minimized sampling variance is HQ In vi , 
term on the right omitted, i.e. on simplifying ? (39 ' 4: *) Wlth the second 


V(M -m{^^r-sN,A 


(39.43) 


-nations of the f ronl the va(ues 


feg" the °' fe that we made all the * equal , • 

1 9 » 3s m our original 


n 


i n 


N t a H l. 


(39.44) 


I 
















mean of the 


SAMPLE SURVEY THEqry 

„.* s at once from (39.38) that in this case a rpH ^ 

It fol denote it by fW- e ** reduc es to the 

now easily seen that the second and third , '" “ K com Plete 

1 „ value but of opposite sign, and so cancel Tk* ° n the right of 
M ua hc USF allocation (39.44) is therefore given K, sam P u »g variaJ? 9 ' 41 ! Jre 

*>*41). This W ' U be greato than 

°. f (3 lht of (39.41) is zero, i.e. unless vdue unless the l w 


Oi = SW, 0,/JV, a i Uj 


w 


an* 


requires that all stratum variances be equal. I n this 


hich 

d (39.42) agree. 



ease, of course, ( 39 , 44 ) 


10 18 We now compare the sampling variance under a imp *ii *• 

as in 39.17, with that under equal-probabilities random sampling lithnoTmffiS 
a .y eIl i n ( 39 . 10 ), which we now rewrite^*) cation, 

V{m R ) = a 2 (w _ iv) = nN(N^\) ^ f*Y}\ 

this identity holds because of the Analysis of Variance identity 

(N -IK = 2 (N.-IK+XJV,^,-,,)*, 

l l 

which is (35.25) rewritten in another notation. We have seen in 39.17 that for a 
USF allocation to strata, 


N-n 


and this is to be compared with V(m B ). Their difference is 

rfS(N’,-l)<t| SN,ofl SN, 


N-n 

V(m R )-V(fiv S v) - 


N -1 


N 


■+ l - 


N -1 

it equals 


(39.45) 


(39.46) 


(39.47) 
its right is zero, i.e. 


The term in braces on the right of (39.46) is negative, since 

_L_- S(N,-N)of<0- 
N(N- 1) I 

r /oq 46 ) that if the last teim on 
It therefore follows at once from (39. ) 

if all p, are equal, J/(m«) < Wtw). in sam pling variance. 

that stratification with USF are - •£^ 

rthermore, we have already „ are all equal and ^ 

" “ ) = ^mv)- _,i.c in an increase 


SO 

Furthermore, wc ^ 'TKus 

the same as the MV < y^) - straa resu lts m 

In these circumstances, a«y allocattonof^f ^ popu lation 

of the e __———" larement 

„ ling without teplace-u 

*. e ia ndom 

abilities um 


in the sampling variance 


The suffix R denotes equal-P roba 


* 


STATISTICS 

the ADVANCED THE0 ^ ° s htly , a result like (39.48) is sti „ 
,W . differ slightly and tire * dlffer 

possible" i.e. »e may have K(WjI )< K^av)^® r g xerc ises 39.14-15. 

Armitage (1947) gives ^ in ,-#{ 

39.19 Results like• S NAfr-pYA* ~ 1), 

upon the inequality (39A ^ square bracket 1 ^ ^ Thus, jy _+ „ wi , h 

A, whereas the: other ter same ord er of magn d negligible, and the 

is „f order 1 rf the IV, are ^ ^ right 0 f (39.40; , 

iV t /iV fixed, the term m we have (39 4 ^ 

other term " and the equality 

and more generally in 39 . 16-18 we have- ^ the use of s Ratified 
which produces the improvement in sampi g produces any further im- 

sample with a USF (and the variation among the a, wh ch pr ^ at (h 

provement due to MV allocation). This matches with he^en 
end of 39.14, where only a USF was discussed, that strata should De as u 

^^ThtTse^/strati'in sample survey designs has obvious similarity to the use of 
blocks in experiment design (cf. 38.14). In each case, the grouping (of individuals, 
of experimental units) has as its aim the elimination from error (sampling, experimental) 
of the variation between groups (strata, blocks). There is, however, a difference of 
purpose as well as a similarity of method. In surveys, we are interested primarily 
in estimating the overall population mean, while the general mean is rarely of interest 
in experimental situations. This difference is a reflection of the fact that (cf. 38.3) 
experiments are concerned with hypothetical, rather than existent, populations. 

Minimum variance allocation for fixed total cost 

. 3 . 9 :J? Before le r avin S the question of sample allocation to strata, we eenerahVe 
the MV allocation formula (39.42) to take account of variation in costs r 

between strata. We have deferred this generalization 1 f sampling 

it as a special case of a general result on min beCaUSe WC ^ now deduce 

which we shall also find useful in otherTonnlxfcns"" aU ° Catl0n for fixed total 

King used is of The" fo™ SamPl '" g Pr ° bkm ‘ he sam P lil 'g variance of the estimate! 

v = Vo+ | Vj_ 

rtiere o 0 and the v are f * ° 1=1 w i (39.50) 

RSSSUS - - -! - » 

y g out the sample survey 

/^r & 

~ c o+ 23 zo,c, 


1 


U l L l 


(39.51) 
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appropriately labelled “ overhead cost '„ ES ’ GNs 

Wen° wwnte ’ and 

v y - 


i vs a 


ca 1 


f 

i 


>he ^ au . v 

; »£ co*«*» n 


(r-r>„)(C-c 0 ) = S Sv, 

Cauchy inequality (see 2.7). The e quaUty ’ 


V, 


kW, 




atta »ned if an . 

an d oni Y 

J X-,/, COnstant > all/, y 

f rr he extreme right-hand side of (39.52) i s i nde 

I l> oldS ' to satisfy this condition, which we rewrite P ent ° f tl * w h So ch . f 
t* 01 alii, WOt 

I # minimi VC, i.e. it will minimize V for feed c („ r c for fed ^ (39.53) 

39.21 I n our P resent a Pphcation, the variance is given by ( 39 , 39 ) 

function is * 

C = c 0 +S n,c, 

i=i (39.54) 

! whe re r„ is the overhe: f ° l 18 ^ C °? ° f an observa,i ™ in the 1th stratum. 

| ^ nt ifying (39.39) with (39.50) we see that here 


^0 — TTfS ^NiOf 


) 


Vi = 


iV 2 i 1 h 
_ 2Vfof 


iV 2 ’ 
w, = n b 

and hence (39.53) gives the MV allocation for feed C as 


n t cc 


N t a t 



_ (39.55) 

Ncf 

The sampling fraction nJN , is now to be made port'd “ 
tfratumvariance divided by the square root of tte strahim ^ a special case of (39.55) 
our previous MV allocation formula ignonng costs, 
when all Cj are equal. 


The formation of strata noticed that throughout ou * dl * ^ adv aiice, 

39.22 The reader will no dou t a fixe d strat a are §^ these stra ta. hut 
39.15 onwards, we have been a ^ suming w alloC ate our sample ^ done from the 

and that our only problem has been stage . H° w 1S ^ first discuss this 

these strata must have been fonnedj^ ^ estima tion? ^ 
standpoint of ultimately minimizing t yi n g k. B m ust hf “. 0 f the 

ft fixed, and later consider the effec ^ ifl forming^ ^ ^ 

In view of the conclusion of 39.^ d l * p0 ,„ts 

the variation between the str t. oU ld select (k 
variable y in the population, we 


N 






wanted 
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to form k strata 


F statistic 

. ch osen? The problem j, 

THE ADVA>—" points be cn ^ distribution of 

How should these « b"° W h Qr the values of a variabi^ 

,0 torn. , «° se that we neve - Jy, , 0 gu de u ' ob lem ts of prac, icaI 

theoretical, m f h l e past values of the hattheS olu ton t 

• be used. Ignoring constants "J 

in U The has, . ^ . USF »f we rewrite the satnp,;^ / 




variance (39.45) as 


V(/tusr) 


ccV = S 


* ro ‘ (y-f‘,Yf(y'> dy 


1=1 J Cl _ 


(39.56) 


’i- 1 • represented by /O'). and 

ion is repre ^ Doints in the 


■ f v in the population is repr ^ cutting po i nts i n t 
where the distribution o y = }> ^ ^ . . . , r»-i ^ ( 39 . 56 ) for choice 

— 1 . a. UrMinnJirlGS# ^ 


range°of y^which d^errnine^the strata boundaries 

of the c’Sy we put 

0 Jl = {(c,-/t,) ! /(et)-( c '-‘“' +l) /(C '"' 
" h- 


/ = 1 , 2 ,. • 


* J 


A-i, 


so 


and since 
this implies 


Hi<Ci<[*i+i 


(39.57) 


- K/“/+/“z+i)* 

We therefore choose our cutting points so that they are half-way between the means 
of the strata they form. Given /(y) and k, this is not difficult to achieve numerically, j 

39.24 If on the other hand, we are to use the MV allocation sample sizes after 
the strata are formed, (39.43) is to be minimized by choice of the cutting points. The ! 
reader can verify by substituting (39.42) into (39.39) that the second term in braces 
in (39.43) arises only if the sampling fractions n l /N l are not negligible. We neglect 
these fractions. Ignoring constants n , N, (39.43) is then rewritten 

V(fi m ) ccD* = [s T /0)iy. P' (y-^/MrfyV 

L_ l sj j J ^ 

We need only minimize D, so that we put 

\l f(y) d y-(‘i-/i,) > f(c l ) 

S ~1 ---- l —1 




(39.58) 


0 =®- 

dc, 


!L f(y)dy 'iljy-^Yf(y)dy^ 

S(y)dy+ 


c i +1 




, , -c, 

~ •. •, k— 1 ^ 
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0, this reduces, on cancelling , f actor ' SlGN S 1 

notation, to Jv ‘ th TOugho ut 

„ j is the same for all strata, (39.59)i reduces to (39?57) it ^ 

If „„ ar e not so easily satisfied as (39 57^ Q : h 1 mu st, but in c 

^ate involved (as we should expect). We 

5 „ MS lar g e, we may assume^) t0 be ^^ 


nual'to fi- Then 


constant within each 




stratum, 


^ a i * ts(ci“Cm)*. 

K variance of a uniform distribution. Thus the expression to be minimized in (39.43) 
; tQ t ^ e s ame order as in 39.24, proportional to 

If we now define the transformation 


(39.60) may be rewritten 


z{y) = {/(*)}**» 


(39.61) 




lS l \ Hi 1 \ v, 

We therefore require to minimize 

rajplfthe wattrmin (39.61) and determine the cutnng pom s 

of the equation . « fe_l. t 39 ' 

^r) = f . i, du e,showhowto 

, q57 * 195 9 ), to whom this ^^alternative appm* 1 "' 0 ” 

Dalenius and Hodges ( ’ c i 0 ser appwxi® 8 * 1011 ’ . w 

use it numerically and “btarnj^ by writing , more (39.fi) 

given by Ekman (195 ) ther , 

2 Ni<Ji' v/12 1 ru is constant and 

that 2 ] .miivalently 

It follows from our discussion a o« c0 „ st ant, -r ^ (39.^ 

fore minimize (39.63) by 

• iKr the use performed 

. j numcticaby t.j they P 

Cochran (1961) examtned n foun d th 

for small fe (equal to 2, 3 or 4) an 




s TATlS'f lCS 

_ aD vanced theory 0 ^ ^ discusses other> less sa% 

186 1 • skew distributioi 15 z'39.62) and (39.64\ 

„lied to eight represenmttv® 59) s°^°* e[e s o, the MV alloca. 

£ory, a PP ro f iffl S that each 0 [[ strata. But ‘ t every stratum. Th Us 

le should b ‘ » be constant over a const n w a van 

implies that N ,^‘‘‘ tha t we sh ° u p)e size in eacn t are chosen to mini. 

be Tr n a Exercise 39.18; Cochran 

- S==■»■ - - •" *. mv- j-j-sr*; 

39.26 If we always samp ^ never increase vai ian y garnple 0 f n observa- 

are large, (39.49) as ^ r ^ S ^ is led logically to the conclusion if we want 

form sub-strata, so that onW g one obse rvation tom* observations 

*» ^l^ttatl tpling variance (requiring — ^ 

in each stratum) with k = [i«] st J' ata / ^ * 1S g00 d deal to be said for doing this; 
knowledge of the underlying jtowbuton, t0 be justified, for there is a good deal 

but otherwise the labour involved IS h y y attainable variance declines 

Of empirical evidence drat as mcreases, the - + ^ as gQod as the best . 

Z".fettlt our knowledge of the underlying distribution is usually 
S r imprecise. Cochran (1963) and Dalenius (1953) gtve numenca examples. 

4 detailed empirical study of the effect of strata formation and sample allocation on 
istimator variance was made by Hess et al. (1966). 





il 


can 


39.27 Finally, we remark that the effect of any stratification upon sampling variance 

_always be estimated after sampling. To do this, we need only use (39.40) to estimate 

the variance of jx and compare this with an estimate of the variance of m lh given in 

suitable form at the beginning of 39.18. From that formula, we see that 'the only 
problem is to estimate y 

B = 'LN l {[i l - f xy 


Now because 


~ f N ‘ K ~N ? N ’ r ^ + f^ N lNrl*,fl r }. 

£ W-^%) = if/, _«j\ 

»A nJ’ 


(39.65) 




i~ai 


n i\ N,)j = K 


we have 
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fr o» (39-65). an unbiassed ^ ' ** SIG Ns 

^ 4 /. * ofB 




flie 


1 


(39.66) 
from 39.18 


«' "/>■' »iV _ sr, 

^„ r —1 

i>( m \ _ JV-n _ ul erelore 

a) nN(N-l)$( N i~ ltf+fy 

* is defined by (39.66). 

6 

, e designs : clustering 

"t# We were led to the principle of stratification by out disc ■ 

? effects of varying the probabilities n {) while the n- were all fi a 810,1 in 39 -W 

of th that it would be P rofitable t0 increase some n u slightw^ f " /JV; we s “ 

* ere sate this) as compared with their equal-probabilitiL d r , educ ° the 0,hers 

10 it may not be worth while to’pjS^rS ^ 

n° w a t* as possible. From their definitions in 39.6 we see hT “ S ° me ot the 
«£$£ *, = n/N, n^n/N. see that ^ 

Suppose, then, that we divide the N individuals in the population into N , 

8 containing N, individuals (so that = N), and that for all pairs if 

__ m-i-t- -rr.. = 7 T.. — .771. = 77. / /V. T'VtPrp ot-A M (AT l\_ • ■. < . 



SO 


groups, 


each containing JV 2 mumuutub v - -- - 1-2 = ana that tor all pairs i, j within 
_ v group we put w /-^- There are iV 2 (iV 2 -1) pairs within each 

up and hence ATiiV 2 (iV 2 —1) — N(N% — 1) pairs i, j for which is thus increased. 

F 0 m(39- 16 )> al1 the N ( N ~ *) n U' m the P°P ulation must add to n(n-l), so that the 
^#_.l)-iV(iV 2 -l) = N(N-N t ) pairs i, j whose 7i {j has not been increased must 

be allotted values of 7 % adding to n{n-l)-N{N 2 -\).~ = n(n-N 2 ). If we make 

fiiii — JV ^ 

all these values of equal, each will have value Suppose that we chose 

«to be a multiple of iV 2 , say % iV 2 . Then these Now we recognise 

from 39.6 that this is the value which y would f { * 

using equal-probabilities random samp mg. se lected as a whole or not at all, 

and n { within groups implies that each group of divi di ng the population 

we see that our present sample design con 0 f these with equal probabilities at 

into Nl equal groups (called clusters) and selecting * of these 

random. 


anaom. > the popula- 

39.29 A special case of cluster sampling‘ s single sequence of N - 

tinn its ovrorvrrAri /oitlmr nhvsicallv or by mean , individual 1 , 


39.29 A special case ot cluster sam P -& ~ s ^ in a single^ 
tion is arranged (either physically or y mea ^ a single indm « selected , 
V,1V, individuals. From among the ff the #th mdw 1 #+ y/ u .. • > 

random with equal probabilities of selecti • positions p, P +. r A the re is a 

systematic sample consists of the md.v^als n. P ^ are posslb le 
f+flV,-!)^. Thus only IV, samples, 
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statistics 

„_ n theory of duster sam pi ing dcs _ 

the ADVANCE fflp ling and .*j them in the literature 

. , betw een ****%£%^Tmhioh are f «***■#» 

formal identity . , futures _ n j 0 ys clus . -^Jiich (ns the nam e 1 
comp e e 23 but a f eW s ^ ,j c sampling e s 0 f clus ter 0 p U lation. Second 

crib e d ‘ Set that system** with other fe«£ physical pop ^ «, 

in XmtamSngi *• «“** il 

implies) use sets of * , ists which** bilities rtmd°* pllin g. But the most < 

svstematic sampling • fof equal P uas ,-random coin monly find that 

1 practically easier 8 etiro es kno atic sampling i[nmedia tely renders 

in these circumstance* ^ that sy . ^ ^ = 1. 1» vailab ility of supple. 

important f d fe'“x po ssiblesam P' eS impossible withoUt y that %?2 so that valid 
only ° ne of sampling simpler to insist tn 

The com.i- 

»»»“•'“ *'&-"» •* - '“ 

butions from the pairs W w ^ contributions from the other pairs, 


TtiTtj-Xij - A/ 


__ . for these pairs. — ‘ tKT -n\ 

N) The 

, ,,, he oositive, since for them jrrt-% jVf(JVj —1) 

different clusters, I „ a | ue s of I y-v< I in the same cluster 

argument of 39.14 now applies: if we put the large perhaps more dramatic . 

with jr« maximized, we should exp ( • „ n erate to reduce (39.23), instead 


in 


ihould expect reduce ( 39 . 23 ), instead 

ali; than in 39.14, since the larger values of now operat 


the wosite direction from the principle of stratificat.cn we have discussed m 39 14: 
form the population into internally heterogeneous groups to reduce cluster sampling 
variance below equal-probabilities random sampling variance. 

As with stratification, we shall now abandon the general framework and enter into 
a particularized discussion of the details, but we make two general points here. 

The primary distinction between stratification and clustering is that every stratum 
is sampled, while clusters themselves are subject to a selection procedure; it is this 
fact which leads the principles for the two methods in opposite directions. In stratified 
sampling, the sampling variability is confined within strata and we construct strata to 

in cluster sampling, there is only between-cluster 
variability, store every cluster ts sampled entire, and we construct clusters to minimize 


».28, th/neither !t is ex P licit m 39.14 and 

dusters of equal size, make anv chance i u nor cluster sampling with all 

“ ? Ch * e ' just as f »r equinrohahIV ra Seiection Probabilities n h which 

modifying the joint selection pVobabiUties ar the ? e meth fi ° d d 

r uues n ti . Of course, in stratified 
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. „ vith variable sampling f ractions ' De SIg, 

P ling i:.c to cluster sampling when H„«L e 



, a ^ritu r b AA Elions 1 

5 ^ p "lolies to cluster sampling when cluster ** thei »sel VK 
s at» e 3P I general situation below, and at the sam T ° C Une qual ^ c Ked. 

- another d,motion. 

Ollt An * ‘ 

...stage sampling 

,otl Cluster sampling presupposes a Pm , • 

„f the groups then being selected. l t f s Pw 8 °f the pop „ w . 

S ° m uion where the groups (clusters) are the subject oTf ‘u cons ^ 'h e l"’ a “K 

ss. *■ 

^‘“"population of N members is grouped i„ to N ^ * m0re **"' 

called clusters). The ith such unit contains N lt second-L ^ (previously 
contains N«a thud-stage umts. This hierarchical process ca„& whi * 

bu « we shall not consider more than three stages, and indeed it 
f r our purposes to consider only two stages. sometlmes enough 

0 three. staees, we have 
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We li and the 


C0l>v wienttJ^ ^ 
gene rali 2e 


1 


• uui Y — r 

With three stages, we have 


N = 


N , N (t 

2 2 N ijz . 

t=i j =i 


At 

spe 


i=l j= 1 

the first stage, n x (out of N,) first-stage units are selected, by a method as yet un¬ 
specified. Within the ith of these, n i% second-stage units are selected, from the/th 
of which n { jz third-stage units are selected, and sample size is 




n 1 7i i2 

n = 2 2 n. 


'ijfr 


i=l j=l 

Wr assume that sampling at any stage may be with unequal probabilities; that selection 
™ “ sta „e is independent of selection at other stages, and that samphng within any 
unit It I given stage is independent of the sampling withtn other units at that stage. 

1 _^ TJia 


, • f nr m of the unbiassed estimator. The 

39.32 We first have to determin ... . • 39>6 _ 8 applies here, and in 

general theory of selection with unequal probab \ ^ " d (39.24) remain 

particular (39.20) gives an unbiassed estima /*» an estimator of it. The 

valid expressions for the sampling ° ^ course , to overall probabthw ° 

tit and l in these expressions W e now 

selection, taking account of a ® orres p 0 nding to a division mt ^ firsMtJ?e unl t, 
the population as y ijk , each suffix correSp _ nation value in the +Jiirfl _ stage un it 


the population as y*. each suffix corres^nu^ Ja unit 

at a stage, so that, for examp f ’ ? 84 ®, . g rs t-stage unit an 2 o) becomes 

in the 4th second-stage unit w.th.n th* ^ estlffl ator pw 
within that again. In this notation,^ ^ ^ v .. k 


4 «r "S' yiiH 

1 s 2 * 




where 


^Vnchilltv of se _-TLiitms WP 


ti probability ot ^ nhabil'Ues m P re ‘ 

the 


■ ^ 
The parentheses in the suffix 

viously used. 
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190 nse that samphng 

For example, supp ^ ^ ^ 

Then ?%*:) = 2 ^’ 3 


and (39.68) reduces to 


AT, f 2 ^ V« l " • ^ ? 

, ^ = jV*, ainj> vv 
. \' all U an ^ 

If, further, JV« = "=’ a (39.70) 

(39.69) reduces to ^ s s £ y, it = ». 

' = N,h Tcj: N = * = (39 ' 70) !S 

the overall sample ^an, *>cein tlm ^ situat ion. 

intuitively obvious r ^ ^rop the suffix k 

»H3 = 1. 0f P Uttmg Nl , f 

hence its summation) redundant. 

3934 Just as in our treatment of stratified sampling, we shall find it more con¬ 
venient to make a direct approach in discussing the samphng vanance of our estaator 
hTmulti-stage sampling, mther than to persist with the general unequal-probabihnes 
notation—this will avoid, for example, the use of symbols like ?%•*) (tuv) l°r the joint 
probability of selecting two values y m , y tuv . We shall consider in detail two special 
cases, the first of which is that of sampling with equal probabilities at every stage. 

39.35 Consider, then, the sampling variance of the estimator fl at (39.69). It 
is obvious that each stage of sampling contributes to the variability of fi. Since 
sampling is independent at the different stages, we divide the variation of ju into a 
sequence of conditional variations. First, we consider its variance at the last stage 
conditional upon earlier-stage selections being fixed; then we allow the penultimate' 

S! T ‘° ^ “ d " ° n un,ilthe first - st ^ selections arc varied. With 

three stages, for example, we take the variance at the third o+ on -» a • • i 

first two stages’ selections being fixed th^n , 11 v StagC ’ condltlon al Upon the 

™*. and finally allow the SdeCti ° nS «° 

process var y* Symbolically, v r e write this 


T» evaluate (39.71), we shall ^ = ^) 
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Consider a random variable « an ,. ' ^Sl G v, Q 

39>3 BV the multiplication etcb ea COn .. 

i ; It f. . . . . .. •< )S» . ». 

4ngsy mboliCallythefaCtthatin findi ’ 

£ sl ^ -finrl the exnert^tm^ 


'Edition, find the expectation of * gi ° n ~‘' cx P'«ati 0 „ of ( 39 . 7 a 

any c f the condition by taking the expectati that c °nditi 0n f * We "ty fim - 
t<tinnce of * is obtained by 

£[E(* s |c)-{E(*|c)}*] + £^,, ® , p ,den tity ' pec «o n itS( J ( 


lln g the 



j 


E[E(x 2 \c)-(E{E{x\c))Y] 

= E{x*)-{E(x)Y, 

no n\ By definition, the first term on the left u , ., 
idfiionnl variance of * given c; the second term on thetft , hc 

nditional expectation of * given c; and the extreme right-hand • a fte va *« of ft 
c ° . ce of *. Thus, symbolically, we have 8 nd Slde| s the uncond.tiona, 

? ' F(*) = £{F(*|c)} + F { e (; c | c)} . 

, • • ,u ‘ ‘ (39-73) 

fte unconditional variance is the mean of the conditional variance olm ft • 
f ,he conditional mean. Note that if E(*|c) does not depend upon c , P ft e 

n 139 73) is z ero > and 13 sim P 1 y the m ean of the conditional variance 
1 The result is quite general: for example, it was, in effect, used in 17.35 to establish 
the Rao-Blackwell method of improving estimators through sufficient statistics. 


19 37 Using (39.73), we now see that (39.71) may be written 

v(/i) = wm+nm, 


12 3 


(39.74) 


; . 0 + u _ qvmbol “ 12 ” for the first- and second-stage conditioning. (39.74) 
where we wr J . in twQ partS) w hich we now evaluate separately. 

first the value of At the third (more generally, the to) stage of 


ttl 


Consider 

selection, each of the t n i2 
with equal probabilities, tiy 3 i n 


selected second-stage units is sampled; the sampling » 

with equal probabilities, playing the role of a stratum. Delimng 

a stratified sample with each second-stage u ? 

. _ U £ V.;n; 


My 


S ym 1 


we know that 


tfT, I . ••'Ijo \ 

r (««) = \ /W i( fronl which this sampk o 

• *n the second-stag e 

where a% is the population va ”“ Ce (3 9 .69) that \ (39.75) 

«m was drawn. It follows from ( % 


vm 


m 


S Ni; 


Oil l 


ftijS 


Wifi 


NnJ 
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THE AD VANC f ,39.75) * ben * h s t0 vary, « ° btam 
» find the «P« U “° e n c ;„d-stagc selec«<> ^ . 

We now To vary i" i,l,oW1 " g v « r i 's'r^f 1 

are allowed to vary % fa * y n . p \ 

EVifi) ® (fim) *** 2 . /Tmole mean whose e: 


v . 

Irrl -Zi «. c •- - , mean whose expectation .s , hl 

i i; tw \Ad,/ , a sample mea 

!J . in square brackets 

a „d since the expt“*" n roean , this « ^ ff |. /, _ *SfiY 

corresponding PP ^ y n ffa S N’i 3 n .. 3 \ 

= [d) " 

Similarly, with the * £ $ ^(‘'Ig). 

_ JLi ****(''& 

. td ;S-o;,heH g hto f (3,74). 

We have thus evaluated the firs 


(39.76) 


V 


e nave ui— 

j va lue of E({^) u From 
3,. 38 For die second term in (39.74), we first need 

(39.69), this is _ K | Na J 

= Nj_^ ^ 2 N ijZ [iip (39.77) jfl 

iV?/j t=l W/2 7=1 

where ... is the population mean corresponding to the sample mean r»,,. We write 
V T - T- for the total of the y-values in this (i,/)th unit. We now re-apply (39.73) 
toThe second term on the right of (39.74), which then becomes 

F(d) = £F{F 0 !)}+£[F{£ 0 i)}] + F[£{£(/l)}]. (39.78) 

v 123 123 123 

The last term on the right of (39.78) can now be obtained from (39.77), for, as before, 

A T , !( ,r nfl 5? ™ 1 


*W)} = ^ 2 JV„£ 

2 3 dYWi1=1 2 


S' T-t 
^ 1.1 


hence 


AT «i iW (a 


■n • , V 
L"i2 7=l 


12 3 

variance required is that of 


(f 


rfij/'i 

1 L«1 i=l \i=l 


T„ 


of a sample mean, and is therefore 
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var iance between the totals of the y-valu cs in e 

! ,1 Sr.v 8ta 

A-tcztAU-^I. 


require 


»l. 

Ti= s T i{ . 

? = 1 

e middle term on the right of (39.78) p 


tfflft * ,e 


the variance 


= (i^Y S NJ^s fl-M 

\NnJ i,i “ n a \ Nj' (39.80) 

between second-stage unit totals within the ith first-stage 


1 & ( % T « 
i^A r<i hv^-, 


,19.80) noW 8 ives 

1 £[F{E(/i)}] 

1 9. ‘3 


NX 1 L«1 i=l Wia \ 

J*±- I 1 ivi 

N 2 Wj i=i n i2 \ N i2 / 


(39.8V) 


39 39 Thus, substituting (39.76), (39.79) and (39.81) into (39.78), we finally 


... /iVA 2 t _M + JM S IV, 1^3 
r M = (jvj Ml V JVr/ JV ! nr i -1 K 


2 ^Tw f 1 


+ J± I' *«• 5 JV|,f 

iV 2 Wi i=l n H 0 =1 n%iZ 




(39.82) 


V“ (39 78) and indeed from intuitive considerations, 
(39.82) shows, as was the estimator from each stage of samphng 

that there is a contribution to th c V mmetrical situation 

«. ■ a«a ~ «< mm —*• ,h '””” .. 

AT = Ni, all f; N ij3 = N a, all. 7- 
* 2 n .i ;• n = njtvt,. 

«« = „ s , all »; ««a = *»> ’ y ' , mean . (39.82) redu. 

i no tn the overall sa p 
The estimator /t here reduced at (39. ) 


/ w 3 \ 

AM v, X3Lls4. 

T „„> V iV<&. \__JhL £ dle+Ivn^HWs"^- 1 ^ 1 
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If we now define 

M 1 N ' / 

, _ __L. s (/* 

^ "" iSTj-1 *=1 


,VANCED 


theory 


OF 


-^) 2 


<yft 


j(N t N,y 


Ni 


1 . S 2 ' 'NiNji- 1 

°l = AT^'l) !=1 ' 




7=1 


1 


and 


we may 


1 


o 

= 


NiN 2 (N 3 


__ s s 

. l)i=ii= li_1 


£ $’ 2 
2 i &ij) 


'*'• (yak ~ Ni Ns i=l 


■ m 831 in this symmetrical case as 
rewite (39.8JJ m _ / _ N „i /, »„ 


KW 


(39.84) 


(39.84) 
case 


* S Tn\ 4 A M + ^ L ( 1 'f} (39 ' 84 

_ a j(\ —-] + —-^(^ AT/ n^h'ihX 
- „A W *> f sampling obvious in the symmetries 

to further stages of sampling 

r.„ 9 stages is 


i makes the extension ~ - . 

The general formula for p>2 stag ^ 

V Gy 

V{$) 


= S r- 

r =l «i»2 



(39.85; 


where 


Nr 

■ 2 (j-,}.. 

w=l 


A 2 


1 tft ^ 

„ _ __ J_S 2 

r r . • //v ; s U nitv the corresponding term in (39.82 i 

39.40 If any sampling fracti ,/ j- = j the last summa tion on the righi 

disappears. In particular, if every «*, - /V ^ ^ e samp ling. If ever) 

vanishes, and the remaining two term g (fi) _ • j we are k ac k 

= AT., in addition, only the first term on the right survives, and we ate baclr al 

the most general form of cluster sampling with unequal-size clusters. Similarly, 
the first term of (39.84) applies to equal-size cluster sampling. 

There is, in fact, no difficulty in seeing how (39.82) would extend for further stages 
of sampling. A fourth stage would add to the right-hand side the term 

——f ^L 2 s ^52 *£ Nfa—(l ~ w^X (39.86) 

N * )l i {= 1 7 A 2 j= 1 Vjj3 4=1 Mijki\ 

and a| in (39.82) would have to be replaced by a% m defined as an obvious extension 
of erf., and a\ tj . 

However, it is extremely rare in practice for multi-stage sampling to use equal- 
probabilities sampling throughout. The reason is, quite simply, that the variances 
in (39.82) are variances between totals of the variable y in the different units. When 
the units vary considerably in size (i.e. contain widely different members of next- 

Seen' Z Itt d^n't ‘° make £*“ Va “ nce ° f * ver y lar ? e - As we have already 

of ei.uZre ^ITedefi al d C “ e *" ^ 21 ^ *“ 

general, however we are ohliaed t i nC< ^ aS vanances between means. In 
sampling variance of fi to acceptable^ *7** ° th ® r sana pling scheme to reduce the 
with varying probabilities. We mav of on ** aCt W6 ac ^ eve by sampling 
at each stage, with fi defined by (39 68 '! anr} 1 ^ 8 ] 6 ’ ^ SCtS Probabilities whatever 

y (39.68), and calculate V(fi) from (39.78); but in general 











sample survey the 

the terms on the right of (39.78) will u RY: DE MGNs 

varying probabilities at every sta e Wore co mp i ic . . MS 

A completely general express^' (or w - Sm “ tl *y *11 reflett 

for the purpose of estimating sampli n . ls ^mally de . 

design which is important in practice '' M«n«hUe we e” bd °w 


sample 


Sampling with probabUity proportional 

39.41 Inspection of (39.68) shows that -r *“* 

case at (39.70). To achieve this we mn, • n the e qual-prob a b;ii tlmator 

tth unit at the first stage be n x Ap/N- thVth ° nly , that the Probability of 

ing the yth unit from the ith first-smge unit b ?„ f Cond ^ of slet 

at the third stage of y ijk being selected be n A4® : and that *e probability 

3/ o- then have 


n m = «i~.n 2 lil’ Jb. 
N 



n 

N' 


It will be seen that within any Denubimat 0 9 . „ 

zt «r^Te *eS 

3U) 1 £“ iS ^ 

One simple and convenient choice is to make 


AP = 


Ni, 

s Afo, 

j=l 


= N,,,. 




Each unit at the first stage then has probability of selection proportional to the number 
of individuals it contains: the same is true at the second stage; and at the final stage, 
selection is with equal probabilities. We express this by saying that we sample with 
probability proportional to size (p.p.s.) at each of the earlier stages and with equal 
probabilities at the last stage. 

It is easy to see that for any number £>2 of stages, overall selection probabilities 
for every individual will be equal to n/N if we sample with p.p.s. at all but the last 

<uafre where equal probabilities are to be used. 

Vo s sampling was first theoretically investigated by Hansen and Hurwitz (1943), 
and was actually the earliest form of explicit unequal-probabilities sampling. 

pf = £ N ij3 /N at each drawing; then to select n, secon -stage uni s 

ment fro! each of the * selected first-stage units, 

at each drawing; and finally ‘° 'uni^with^qu^l probabilities 

each of the n,« 2 selected f <’Replacement at the two p.p.s. stages enables 

each drawing. T- he samp mg w hat follows, 

the simplified theory of 39.12 


to use 
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the 




y/e write the estimatoi 2 2 rn ix - 

, _ i s 2 2 »,#* < / 

noW evaluate. 

, H9 78) whose term 

Its variance is given by V • *> probabilities we have, as 

, final-stage samphng 1S 
39.43 Because the hna g 

in 39.37, 

so that 

Hence 


n 3 \ Nij 3 / 


I 


1 


T/ / _2 2 of 

2 ^ {ii x n^fihi i 




i 


,i > „ 
2 E 


12 


£ I / (/‘) = -qr n f [«. ,^1 “°Y A W-1 ^ 

” , „f V units are selected within the «th first-stage unit, 

At the second stage, « 2 out of iV i2 units are 

with probabilities iwf iV„, at each drawing. Thus (39.87) becomes 


t «, Xu / jff ' 

, £F(/i) = ^- s 2 (f#-, 

4 2 3 n\n z n 3 i =1 y=i iv# 3 / 


Similarly 


4 ( ! jv«J' 

f f - dz i I (ffe) 4 ( J -^) 


«.i 


1 W, 

-—- 2 2 N ija afj( 1 — 

n x n z n z N i=iy=i V A^- 


/t 1 « 2 «3- iv 2 = 17 = 1 \ iV yy3/ 

his is the first term on the right of (39.78). Further, 

1 77, 77, 


= - : 
2 3 


1 77 , 7?„ 

f W - jt S , S 

3 //j /Z 2 7=1 y = 1 


1 7? 

= - 2 # 
«1 i=l 

/*»» as previously, is the 


i y —J- 

1 a r ri a 1 1 & **« / tv.. \ 
*■ 2 L»2 a-i ”J #, i=i A \S Nj 

. « I 


(39.88) 


(39.89) 


Piously, is the mean of v in flio «vu c 

„„ „ r “ h firet - s ‘age unit. Thus 

1 iV 

»iiVux 


* • 


(39.90) 
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,„ere we now write N„ = for the ^ ^ 

„ e unit. (39.90) is the last term on th • , 
first's^ (39.89), he n § ht of (39.7? 




^ * from (39.89), 

the fe ’ r w » r i 

BVEW-e\-2 r(J__ S 

12 3 1 \_i 1 2 j=x 


individuals in the ith 


Iight of P 9 ’8)- The m i dd i e 


= E ~ 


i Ux * f [is If(^)1 

= £ TI S I E / *u "y-t 

1 n i J 1 / J 

i _nf n 2 i=i iV^ ? x ^'3 0*# - Hi ) 2 1 


1 ff«, 

WV i?i ?! ^ “ ^i) 2 . 


putting (39.88), (39.90) and (39.91) into (39.78), we obtain 


(39.91) 


1 -^1 / 

+ ——* r s s iv« 3 4(i-^ 
M!n 2 « 3 iVi=i j=1 * 33 y V j\r, 


(39.92) 


39.44 It will be observed that (39.92) is almost exactly of the sairie form as (39.84), 
the variance formula for equal-probabilities sampling in the symmetrical case. In fact, 
apart from the simplification occasioned by the sampling being now with replacement 
\ a t the first two stages, the only difference is that the N iiz occur as weights in each com- 
J oone nt of the variance, as they must do because of the unequal sizes of units. We 
ma y write (39.92) in the same form as (39.84), 

V(u) = - ? + —■-+(39.93) 
n x w x w 2 «r«2 w 3 

with obvious definitions, and all the variances are between means, not totals as in 
(39 82). Thus the present p.p.s. sample design has the effect of eliminating the influence 
of the varying sizes of the units at the stages of sampling before the last. 

Clearly a similar result will follow for any number of stages. The two-stage 
result is obtained from (39.92) by putting «i = N x = 1 and making appropriate 
changes in notation. The reader is asked in Exercise 39.24 to show that in this case. 

on defining symbols obviously, 


Estimation of sampling variance in mu i-s ^ general formula for V(ft), 

39.45 Although, as indicated at the en o • > . g lengthy an d of no 

for completely arbitrary probabilities o se ec ^ enera l method for the unbiasse 
particular interest, it is a remarkable ac • mu lti-stage sampling is easi y 

estimation of the sampling variance of an estima 
obtained, and that it is of a very simple form. 



Qp STATISTICS 

THE ADVANCED THEORY sampling) and that we sam nI 

198 . rhitrary number of stage .j. 0 f the z'th unit b e f ° 

Suppose that there is * s Jge, where the P obab ility of both zth 

W r;‘/ eP S e e ; * -ta * 4 (39-73), wa wnte the Var ian 

included among tne «, rststageJS% . u 6 e 

»th units being selected .. ^ ^ as 

of <r»y estimator (not nece E {V(6)}+V{Ep))’ (39 ' 9s ) 

, , „ o ,o represent all stages of sampling aft 
where we use the omnibus ' i may ■>« written m the orm 

the first. We suppose that the es 

A t 


; er 


0 = 2 t { . 
1=1 


llv is ('39.20) for any number of stages. j£ 
(39.68) is of this form, and so more g enera first _ stage sampling, therefore writi ng 

out independently within the different 

selected first-stage units, 


F(0) = £ V(t t ), 

>i i=i >i 

and hence (39.95) may be rewritten 

= v/s E (ti)\+Z n? V{U), 
i (i=i >i j *=i >i 

using (39.19). Applying (39.18) to the first term on the right of (3 9.96), 
the other term, we obtain 


(39.96) 

and expanding 

F(l5) = S fe)}*+ SS (nf-atP’af) £ ft) £ ft) 

*=1 >1 t/-l >1 >1 ^ 

i±j 

+ s4"[m-(m}‘] 

i=l >1 >1 

= 2 a?’(l -tr?') E (tf)+ 22 (sjj>-aju„ti>) E ft) E ft) 

1 = 1 >1 i,j** 1 >1 ^ 

i*j 

+.2 V( ti ). 

i=l >i 



(39.97) 


(39.98) 


39.46 We now seek an unbiassed estimator 
erms on its right are concerned, we need only 


ibiassed estimator of (39.98). So far as the first 

ft)Eft) ’ ' , " ’ W ° n,y Substitute £(/?) and 

a fy), since t { and t { — ^ 




— tat rtb me nrst two 
, . , ' -- “ for E(tl) and t t t. for 

, k „„ 7*" *r“ pli "* a-«. *-» 

teims also become equal to the first 
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, vith *W> -placed by 1 , Si„ ce these ^ 
follows that^we have 


re Present V / 


S n\ l) (1 - j#>) t f + 22 Uii) (1) 


The last term on the right of (39 , x 

80 easily since if 


\i=l >! 



= £ 
1 




= E 
1 


>1 >1 


Jf (f\ ^ 1 

by (39.19). Thus, using (39.99), we see 1 that > 1 ( 39 9 i'‘' I ' rfiU)S>t ' (<<) 

-(^r(^^ ? : 4' S ) teexpected * 0f 

However, (39.100) is not a statistic, since welhave yet to 

unbiassed for V ((f), we finally have the unhias a • If fW is 

1 unbiassed estimator of ( 39 . 93 ) 

?(®) = P(«)+:£ 

1 *-l >1 v 


(39.100) 


V (39101) eX .Pf eSS , eS ! he ruk «« orally formulated by Durbin (1953Uft 

more specialized statement by Yates. We state it in T l after an earlier 
a unbiassed estimator of sampling variance in multf 0 !™* v 

■4 first-stage sampling is without replacement, is obtainable as the Mrnnft"^’ ** ‘ he 
The firs, component estimates the variance as if o^he fi X“^SX 

selected fi h t com P onent Is the weighted sum of the estimates, within the 

selected first-stage units, of the variances due to later stages of sampling (the first-stage 

umts being regarded as fixed); the weights are the probabilities of selection of these 
first-stage units. 


39.47 The expression (39.101) may be broken down into further components 
to facilitate its use. If we write t- = 2 ty, we may apply (39.101) itself to the terms 
^ (tj) and obtain 

>i 

V (t t ) = V (h)+"£ ( 39 - 102 ) 

>1 2 j =1 >2 

where nf is the probability of selecting thejth second-stage unit in the ith first-stage 
unit. Substituting ( 39 . 102 ) into (39.101), we obtain 

F(0) = P(0)+ 1 nS 1 ’ PM+ (39 ' W3) 

. . V. cJ™ For *>2 stages,^the result is 

.. . . . ■; =. „ *. ( 39104 > 

f(9) = V(0)+ 2 


The pattern for further extension is now 

“ 2 n? 2 «8 

i=i i =1 


i r 
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,vanced w onIy 


SXe *ey *—*- s case for which V(fl)^ 

C„„ ^ „r> <«> 

^ 1+2 iw.w.'-'tCJ ^■■■ ' 


J 


/ ^ \ v I n\ 

=lM + -^ 


where 


*r“ 


, S (^ij,.. • i 
« = 1 


—■ ttlij 


Kj «2 


n, «» 

1__ s 2 

Tv'TK' 1 ) " u ‘ 1 “ * «> (39.105) has the same 

is the sample correspondentof J er the first is multiplied by the product 

structure as (39.85), save on y ^ Here again , a s at the end of 39.47, 

of earlier-stage sampling fractions ^ ^ • • " 
we see that if n t /N, is negligible, 


m * ^ 


(39.106) 


irrespective of the methods of sampling used at later stages than the first. 

39.49 If we are sampling with replacement at the first stage, (39.104) must be 
modified to take account of the replacement of (39.19) by (39.33). It will be sufficient 
to reconsider the derivation of (39.101), from which (39.104) followed. We first note 
that since (39.96) depended upon later stages of sampling being carried out inde¬ 
pendently in the different selected first-stage units, we must now insist that if a first- 

stage unit is selected r> 1 times, the later stages of sampling must be carried out r 
times independently within it. 

is ,oZ h T. !? the eff T 0f m Sampling With -P'-ement on (39.18) 

may therefore'absorb the term in J fromthe ITT" 8 d ° Uble summati °n. We 

P9.97) must be replaced b ” ‘ ° ^ SUmmatl(> " into the second. Thus, 



(39.107) 
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V(fj) = jJ VII ,2 , A, . expected value of 


V(S) = s' „(!>,a , yy , „ ™ 

that here the unbiassed estimating statistic is sim x ’ ( 9d08) 

H®) = P(S) 


instead of (39.101). No contribution arises from t v L (39 ' 109) 

which influence the value of P(S), but not its fori nT" 4 sta 8 es of sampling 
the formula (39.104). Thus the Yates-Durbin rule ri. 0 *” u thlS reSult also replaces 
if sampling at the first stage is with replacement- onlv"tf ^ “ d ° f 39,46 Amplifies 
the rule should be calculated. ' n y the first component given by 

If first-stage sampling is with equal probabilities no mo. , 

which now holds exactly. More generally, i n vi ew of tV,!' ° r , educes t0 < 39 - 106 )’ 
graph of 39.47 we see that (39.109) may be rTJdTd L * “ ,he last P— 

n," -> 0, just as the estimator of variance in single-stage simoh'* °u (39 ' 104) Whe " a " 
replacement may be derived from the without-replacement formukVl^/w 8 Zt 

39.50 If the probabilities of selection are the same at each first-stage drawing the 
general formula (39.109) can actually be explicitly written down, for the estimator of 
variance m one-stage unequal-probabilities sampling with replacement has already 

been given for that case in 39.12. Here, the estimator is d = 2 t , instead of (39.34), 

j so that instead of (39.36) we have 


F(§ ) = T(e ) = n ^ I S.( ij -i) 2 , (39.110) 

a remarkably simple form for estimating the sampling variance in multi-stage sampling 
with any number of stages when the first stage, with replacement, uses the same 
unequal probabilities at each drawing; the other stages are arbitrary, apart from the 
independent sampling condition in the first paragraph of 39.49. 

Minimum variance allocation in multi-stage sampling 

39.51 We first confine ourselves to the situation where, at each stage of sampling, 
the same number of units is selected from each previous-stage unit. (For three stages 
this means that n i2 = « 2 , n ijz = w 3 .) In this case, both the general equal-probabilities 
formula (39.82) and the p.p.s. result (39.92) are of the form 


V 1 


(39.111) 


where the v t are functions of population quantities only. In many applications, 
fairly realistic cost function for three-stage sampling is 

c = c. + n^+n^c.+n^n,^ ( 39 - 112 ) 

where c. is overhead cost and r, is the cost of sampling a single unit at the fth stage. 










qjt gTA^ISTICS 

TUP ADVANCED THEORY it follows f.„ 

102 , of ,heform (39.50- 1 ) «* or feed r(/i)) by ^kiog °> 

:: ;-^ have 

Taking ratios of (39.10) w® ^ (39. ll4) 

^ CaVi \ (39 111 - 12 ) is fixed. This i s a 

• ^ hv (39.114) and whichever v ^ determined by vari ances 
«x » then imphes that later-stage sampl am0Ullt of money avail a bi e 

notable result, for ^ , n ple size » x n 2 n 3 , so th changes, only », should 

and costs in a multi-stage C “ ! >2 ; (39.113) th‘? 

(or the cs '! ma " y P r r e 'ult clearly holds for any nu ®, “ cess ; ve values of / determine 
be changed. Ita ™ ^ (p _ 1} ratios ^ accuracy consideration 

holds for / = 1. 2- • ' f’ 4)j |eaving „ to be fixed by cost 

111, lh> ■ • • > v 

as above. 

. •, t thp hest choice of the (equal) various, 

39.52 The result of 39.51 is concerne * { g the samp i e design is fixed with 

stage sample sizes for given 0 ask a m ’ ch more difficult quest™, follow, 

only sample sizest at choice J?e “ h 7 ch choice „f probabilities of select,on will rmntmi* 
ing Hansen and Hurwitz (1949)- wlucncnu j ^ saw ^ 39> g that lf proba _ 

sampling variance for fixed cost. n t ® on f var i a ble, sampling variance 

* more complicated, as the 

would be identically zero- f Jater-stage probabilities come into the 

general variance formula (39.98J indicates, & 

reckoning- However, if sampling is with replacement at the first stage, (39.97) is 
replaced by (39.107). Furthennore (as the reader is asked to show m.Exercise 39.27) 
if the same set of probabilities is used at each first-stage drawing, use of (39.32) reduces 

(; 39.107) to 


V(t>) = S *!•’{£&)}’-+{$ + S n\?[E (<?)- {£(«}*] 

W i=i >1 «i [<=1 >1 J i= 1 >1 >1 

xt -I /* AT 'No 


. ? 


= f>£(4) 

t=i ~ 1 


t=i >i h»=i >i J 

s depends only on the si ( / } at the first stage. 


(39.115) 




39.53 We now restrict ourselves to two-stage sampling, using constant-probabilities 
drawings with replacement at the first stage (so that (39.115) holds), and equal- 
probab,hues sampling at the second stage, and to a self-weighting design (see 39.41), 1 

same ThisTt pr “ bab ' ,lty ° f selectIon for ever y individual in the population is the * 
same. This is two-stage p.p.s. sampling as in 39.42, and 
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I 


n i2 = n 


^ 1 ) * jy-* 

We use a slightly more general cost-function th a „ ,k (39 ' 116 ) 

q _ , /«, x ' two-stage equivalent of (39 i 12 \ 

C-M-Ws v\ /j \ l39 - U2 )' 

V=1 J 1^2 ) c 2» m n« 

Which allows two components of cost at the second , ' 7) 

of the first-stage units sampled and the other to An °" e ’? portional to the total 
be regarded as the cost of “ preparing " the fth first-stoe^ °, ? mpk (JV « C * “V 
sampling.) If c 2 = 0 and n ia = „„ all i, we re t urn t . g f un “ for ‘he next stage of 
(39.117) may be written the form (39.112), By ( 39 . 116 ^ 


c = c„+B lCl+ 2 N ( n 4 \ 
i-1 *V JVxS"/ 


(39.118) 

However, (39.118) is a random variable which we cannot 

instead with its expectation dvance, so we work 


N 

E ( c ) = c 0 +n lCl +c 2 2 n ( »N i2 +nc' 

1=1 


Ni 


i 

:e 

le J 

y 

ie 

is 


(using (39.33)) which we rewrite, since 2 ^ l) = ? h by (39.32), 

1=1 




E(C)-c 0 -nc:,= 2 ^“(d+JV^c,)- 


i=l 


(39.119) 


Because the sample is self-weighting, we know from 39.41 that 


so that from (39.116) 


”l 1 «t Ills 

fi = 2 ti = i 2 2 y„ = «, 

i=l Wi=lj=l 


1 WlS 


1 » JV (i 

4 tij-t 3 '"' Nrt"'n a )Z i 


- •*» JV it ? 1 


.( 1 )> 


say. Thus (39.115) becomes, in this case, 

iV 2 F(/2)+-l 2 E (#««*)]* = S ^ {(#«»**)’}• 
v «! (t=l >1 J • 1=1 n i >X 


(39.120) 


39.54 We thus see that the expected cost 
variance is a linear function of their leciprocals. re nlaced here by the 

(39.50-1), so that the argument of 39.20 hoi s- good^ once from ( 39 . 53 ) 

> the C[ by {ci+N i2 c 2 ) and the v t by {E( t 2 1 ) j 

that the <!' which minimize V{fi) for feed E(C) are given by 

{i(W) 2 .JV,. 

W') 2 * *+jvT’ 


(39.121) 





theory of statistics 

3M 1 ”the right is the total '“‘ ^“Tel'sentialiy reflects the varilb^ 

° { thC 8 ° ne W ° U " 

expect 


it.. a -r , = 0 ( 39 . 121 ) then reduces effectively to 
s !> «»<* “ C! * ... „ f the v., in the ith mu. 


v and it £2 ’ v 

If = N„, JV«»i s fj* , the total of the y {j in the fth u„ it 

_ lllt . „v must be proportional ^ ^ m . vary little, we shall 

3^-s«srA--—- 

. a; • * c of multi-stage and one-stage 

39.55 The evaluation of theiwSationof'their efficiencies from a multi-stage 
random sampling, and even mote the estim ^ problem for stratified 

sample, is in general much more comphc ^ a num ber of special cases. 

sampling, which was treated in 3^.2 > Y sampHng varia nce, and indeed 

It is extremely rare for a multi-stage sa P redu ce costs rather than 

the motive for multi-stage sampling is almost mnxablyj ^ applied , 0 an 

reduce variance dlr «' 1 >' ; ^ e ^jj°“ 39.51 makes our point dear; there, one-stage 
increase in sample site. The resul »t 5 ^ th / solu ,; on 0 f (39.114) is that 

random sampling is seen to be most efficient y here . Since the ju 

« and n° be as large as possible, i.e. n 2 — -/V i2 > % iy/ i)3 J . 12 

a^d V, are usually themselves very large, such a solution requires very large values 

for ,he cost and variance ratios in (39.114), which are almost never found in practice. 


39.56 Finally, we mention briefly that the benefits of stratification, which we 
discussed for single-stage sampling in 39.13-27, apply at every stage of a multi-stage 
sample. Practical multi-stage sample designs therefore frequently incorporate strati¬ 
fication, particularly at the first stage, which often contributes most to the sampling 
variance. All the foregoing theory applies separately within each stratum, including 
the Yates-Durbin rule of 39.46 and 39.49 for estimating variance. 


» 


with N members, and fro ™ a P°P ul 

the mean of these d distinct values is an unhia^rt f mcIuded in tIle sample. Show 
Its variance is smaller than that of the overall samnfe^‘7 ° f popuIation mea ". and 
by proving the inequality P mean for ”> 2 . and equal to it for n 



(Raj and 
holds if 


Khamis, 1958. The same result 
ls ed and n a random variable. 


) 







: second 


sample survey T HEORy . n iP 

39.2 Two units are selected from a popul a t io ESlG NS 

am ** ,he ith Unit at ,he fiKt being I W ; ,S ’**« «P>scm* M 205 

Rawing •* made P™P»«ional ,o <-i ‘ ' >' T he probability ‘ P ' 0b - 

«the Second 

( 2 )^ =p 3 .( 1_ 1 \ 

i( th e ith unit was selected at the first drawm* Pl/ 

Show that *»■ 

N 

<2 )Pj = 1+ £ 

j'*i *=i 1 -2p fc ’ 

that pmPi 1S symmetric in i and j, an d therefore th ,+ 

that ,be ith um t is selected at the second drawing^ if3n' Ptobabiliw 

/ 1 » ni = 2 Pi and 

wi 4 )hrfy-' 

(Durbin, 1965) 

39.3 In (39.23), show that if the probabilities n- nm 
variable (which are taken to be positive) V(jl) = o Sho proportlona l t0 the values y t of the 
of the variance is also equal to zero, but that (39 22)°becomes ^ ^ estimat0r (39.24) 

= A\-~ 2 S ^l_m\ 

k b- n2 i,i=1 n i} Nftj 

I i*J 

f where * is the sample mean. Hence show that can take negative values. 

39.4 In (39.24), show that if 7i y j = n 2 j = — _^for all j ^ 1,2, and n = 2 with y h y 2 observed, 


■rj' : Sr - ** 


■Cr /a\ ( n n + d ) 2 - 7l u , 

VM “ <*"*>*• 


Hence show that can take negative values. 


*v J 


(Durbin, 1953) 


39.5 Show that if sampling without replacement with unequal probabilities is carried 
so that at the first drawing the ith individual has probability of selection equal top ( >0, Lpt - 1, 

while at all subsequent drawings probabilities of selection are equal, we have, in 

of 39.6. 


__fci) - {W-w)(/>i+/>i) + «- 2 )> 

n V - /at 1U!V_9V 


W-W-1) 

_ -l—{(N-n)pi + (n~\)}, 


N -1 


and hence that the estimator (39.24) is always positive for this selectio ^ r ^ n> 1953) 
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206 and the sampling . g carrie d out with the Qy 

39.6 Show ,' ha, j f ir f E « 3 rcise 39.5, * g^drawfag, « havC ' ’ 

... c—t rlrawing as in as at the n 

i 1 


at the first drawi ^f e ^"proportion as 
obabilities in the same y 


pn 


nij = fiiPi —pi 1 




/ i v 



ond tha . „,e — - - - r ntau ^r:S ; when 


< 


1 -pr (N-2) (* se i ec tion scheme. 

Hence show that the estimator (39.24) is always positive or (A . R. Sen, 1953) 

_ an c show that for any set of pi we must ha Ve 

39 . 7 For .he selection schente of Exerc.se 39.5, ., mitv . 


m 


> nzl > all /, and using (39.16) show that only one m 


at most equals unity. 


N -1 


and (39.24) reduce to the estimator of variance V 

39.9 Show that the * defined at (39.25) have variance 


yfu) 


v, v -'(*0 

VM Pm m Pm "' <»-» <“> *-> 

. . . £ /a^O— 2 3'o)l’ 

(1) (2) (k- 1 ) V. r- 1 J 

where each summation is over all available units at the indicated drawing. Using this result, 
show that for « = 2, the statistic (39.27) has variance 


V (z) = l[/s^-N>^l+{sf ( i ) S^-SPa) W-ym)* 
4 Ll(l) fi(l) J 1(1) (B)fi(2) (1) 


(Raj, 1956) 


39.10 Show that if z defined at (39.26) is to reduce to IV times the sample mean m when 
sampling is with equal probabilities, the weights must satisfy 


c» _ iV^-D 

c t ~ (iV—2)(' < ~i) > 2 ^ u ^ n > 

these (n-1) conditions, together with Ec u = 1, determining the weights uniquely. 

(cf. Raj, 1956) 

39.11 To select a sample of n individuals from a nonulation nf AT mrijr/j i 
***** *(,. 12 .JVJ, consider .he fol,owing P p“o" ” ^ 

0 “ d a number r between 

.he integers 1 to AT. If r Sl , acCv for iho , -f' eger *' by the same P»oce» 

(2) Select further values r, successively withoutTeplacement’f'’ "T* th ' S entire operati “- 
this sequence accent for th , u . replacement from the integers 1 to N. In 

■xceeds on 7 * for *e cumuiative 

eds one of the value. M f 2M t 3M, . . . , (n~] )M. 


\ 


( 


sum r+ £ n u first 

*1 — 2 
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replacemen , operation ( 1 ) repeated «ti mes ** rec > uir ^ »„ without 

(Laffer* p th re Pla c ement 

39.12 * 1 . **»•••»*• are Correlated variates ’U 951 ’ Cjrund Y, 1954) 

necessarily equal. Show that. - £*,„ „ an -a „ and ^ ^ 

variance is unbiassedly estimated by S 1)} ° f Wd ,hat its 

39.13 Show that the generalizations of binom* i 

see (?27) o£ “ random -»■** 

39.14 In 39.15-18, show that 

D = V(m n ) - V(fl m ) _ S N ~n) 

J nN{N~\)i Nl (l l i~l i y 

1 

nN 2 (N~\) (P ~-~ R > 

where 

P = NiN^NtoZ-ftN^y-}^ 0> 

Q = «(£ Nrtrf—iVS<T()<0, 

R = N 2 11 of - (S Ni ffj) 2 > 0, 

P = 0 holding if and only if all cr f are equal. Hence mW HQ aan 0 u .l 

with fixed, the relationship (39.49) holds * ( } ’ ^ that aS ™ 

Show further that if P = 0 and * is small enough, D is negative, and that if N-n is also 
small enough, 

V(vir) < F(/1mv)- 

(Armitage, 1947) 


-■'m 


39.15 In Exercise 39.14, show that if the a t are sufficiently unequal, P-R will often be 
positive, and that then D is a decreasing function of n, so that the reduction in variance through 
stratification declines as n increases. Hence show that if any m in the MV allocation exceeds 
the corresponding Ni, we should increase the gain from stratification by putting m = Nu and 
distributing only the n — Ni other observations by the MV allocation. 

(Armitage, 1947) 


39.16 Show that for any sample design in which the sample mean m is unbiassed for the 
population mean /«, 


where a* = E{y-[i ) 2 . 
replacement. 


pji £ (yi-w0 2 | = ^- varw h 

rify the result for random sampling with equal probabilities without 

(The result is dvie to L. Kish.) 
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208 THE ADVA _ 2 strata , and the whole of the second str 

39,7 in 39.24, show that « *££ u ^ g - P oi„« ft ■» *» ^ ° f > ‘° £ 

is sampled (», = », *e best cho.ce 

sampling variance (39.3V) is fe \ J 


c x = A*i + 


O' 


O'!* 


(cf. Dalenius, 1952 ) 


to be the same in all strata, the cutting-points i n the 
39.18 Show that if sample : «z (rf 3 , 25) by 

range of y which minimize ( r , o 4-(a — 

S 2 ) = JVz+lK+l + V* 




nize (39.39) are given — ' 

(Cochran, 1961) 


39,9 A large random rst^d^to^^po^ 

^^r^-^seSnXc^ ejected value 


SftSSJl'ISrS. —> - approximately equal to F0W) givcn 
at (39.45). 


39.20 Show that if stratum sample sizes are chosen to minimize the variance of the estimated 
sampling variance (39.40), and m/Ni is negligible, the MV allocation formula (39.42) is replaced 
by 

<n(&i-i ) 1/4 


m 


Ni 2 NiffiWzi-iy^/n 
1 

where /?,j is the moment-ratio /f 4 //f| for the /th stratum. This allocation will therefore differ 
from (39.42) unless ft 2 i is constant, all l. 

(Ross, 1961) 


39.21 Show that if stratum sample sizes m differ from those defined by the MV allocation 
formulae (39.42) by amounts Am, V(fi) is increased by approximately the factor 


/! 


1 


1 + -I {(Am) 2 M). 
n 1 


of shtl S efe S «ed 3 (3 2 9’7 S ! ,0 I« a ,V n '‘“T Samp '. ing W “ h equaI clus,er sizes » nd * single cluster 
ze n selected, (39./) gives the sampling variance of the simt-de 1 • ., 

class correlation coefficient (cf. 26.25-6, Vol. 2) for clusters. P ’ P bemg the mtra " 


are 


^/ca h r „r;: h ; t “° g :ani samp,in f“ in 511 

h, nencc cienve the variances at (5.29) and (5.30). 


39.24 Establish (39.94) from (39.92). 


39.23 Deduce (39.105) a s . specie, case ^ ^ 


39.26 Use Exercise 39 19 * 

tnulti-stage sampling where t-ln»°fl e ” Ve ^ le res ult (39 110) f 0r A 

unequal probabilities at each n ^ Stage is sampled with vZ estimatl0n of variance m 
at each drawing. p WIth ^placement with the same set of 


(cf. Durbin, 1953) 
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i2 7 Using ( 39 . 32 ), show that when tK ^SlGMg 

drawing ««h replacement, the m 1(idle 

_lf& , nght »t ( 39 .ll* 

.. . *«.«»•. 




(39.107) each ftrst- 


* > 1 N/ r> 

i ^ence that the difference between the * v. 

I £ withouwep “ varUnce ( ; 9 / 7 > * «p^trr -**■. »*,» w ^ 

„ n d that this may be positive or negative Tf c ** J >X 

tase show that D> 0. ‘ If sa mpling i s with 

S Show that if the with-replacement estimator of varian 39 * ' lUCS at the first 

actually carried out without replacement, it has bias exactly e ^ When samplin 8 »s 

out-replacement sampling has the smaller variance (D>0\ t0 ^~^ D ’ S ° that if with ' 

tends to overestimate the variance. )( USe of the with-replacement estimator 

(Durbin, 1953) 

39.28 In a multi-stage design, sampling at the sth sta^ ( ->n • 

sampling at the (s + l)th stage is with equal probabilities § SW r ? ^Placement, and 
any unit selected r times at the sth stage has its (s-Mlth-stacr* , Cf ' Exercise 39 ' 27 ) that if 
variance of the estimator (39.68) of the population mean is lessXn VrinT^d^ ? 
stage samples had been selected within the unit. r lndependent (*+!)*- 

39.29 In a multi-stage design, sampling at the sth stage ($>1) is with replacement. Show 
* that if any unit selected r times at this stage has the (s + l)th stage of sampling carried out only 

* once within it, and a weight of r given to the results, the variance of the estimator (39.68) of 
the population mean is greater than if r independent (s + l)th-stage samples had been selected 
within the unit. 

39.30 In sampling with unequal probabilities without replacement of « individuals from a 
population of N , the probability that a given sample of individuals is selected in a particular 
order is p( S ), and the probability that the same sample is selected in any order is p s = 2 P(«), 

the summation being over the n\ possible orderings; 2p s = 1, where the summation is over 

S 

the (f) possible samples. If is a statistic which may take account of the order of selection, 


and z, = X i>(,)Z(,)/P<. show usine (39 ' 72 ~ 3) tha ‘ 

E{Za) = E( z (s)) 

and viz,) < v(zt„>, 

. Tjs 0 this result to show thsit 

the last equality holding only when AV ralues of «»£££££ that f(a ( .,) defined at ( 39 . 31 ) 

be^ imp roved Vpon ^ (rf N . Murth y, 1957, and Fathak, 19«a) ^ 

39.3, in sampling with .^'SSoI'X < “ >* 

population individuals have probabihties 
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2. N, and that at Show that (39.25) may then be 

remain in the same proportions as at tne 

Dm 


3^(1) 

Zl = —7 > 

(l)P(D 


yoo 


and hence that the improved version of *(s) given by Exeicise 

n 


- V(d pv}. 


may be written 


/i« 

Z$ = 2 yips\i/ps 
1 


t=l 

where p s \i is the conditional probability of selecting the observed sample given that yi is selected 

firSt * /l\/T M T\/Tn i-d-l-.tr inr-,,. 
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CHAPTER 40 

*■“ ”” 1**^ 

<* of the efficiency of estimation. ^ “»y be, namely the L r „, t ! 




turn, to a h uwuv ,,iuvu “bwcb v 

Inent of the efficiency of estimation. - — J «=, namely the hn prove ; 

In 39.8 and 39.13 we touched upon the fact that i, 
correlated with that being studied may assist us to chon^ ° f a varia ble highly 
t0 construct strata, to make the sampling variance of Ae e im t ° f selection . « 

m»tay information concerning an auxiliary variable XT^T* 11 ' Such »fP'*- 
change the form of the estimator in order to improve X efficielTw directly t0 

Ratio estimators and their modifications 

40.2 Suppose, as in 39.2, that in sampling a finite Dooulatinn, 
without replacement we wish to estimate the population mean of' qud ? r ° babiUties 
write ft,, but that we know the value of the population mean of , „ ^ d * Cl ', W now 

as well as y for the sample vdues. We clearly ought to be ’abk to turn'SrSS 
knowledge to good account. We assume ^ # 0 + fi y . tra 


A 


— I Uv ‘ ~ / 

Two intuitively reasonable estimators of fx y are 

N = Vx™ v /™ x 


and 


(40.1) 


fly \i x m v j x , (40.2) 

where m denotes the sample mean of the variable which is its suffix. (40.1) uses the 
ratio of sample means, and (40.2) the mean of sample ratios, of y and x as a “ correction 
factor ” to the known [x x . 

The expectations of (40.1-2) follow at once from observing that by the definition 
of a covariance C, 


c fe--) 


= E K)- E (?J E « 

= l 1 11 .E(/^y)/h:> 


so that 


(m \ 

E W = ^ _C (m;”*)' 


(40.3) 


« This is to be distinguished from the me,olEsupfdementan, “^sTo 

variable), as in 29.33-46, Vol. 2, to achieve Covariance (cf. 35.67-8) In so far as 

to the use of a concomitant variable in the A y 
the latter reduces residual variation. 
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Similarly 

so that 
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EW = = ^ E ($) 

= /l ’'~ C (x' X } 


(40.1) 

(40,3-4) show that both estimators are in general biassed. Furthermore, since a 
covariance between two variables cannot exceed the product of their standard deviations 
(by the Cauchy-Schwarz inequality), we see that 

/ \ , ^ ( 40 -5) 

iw-fti ^{ v (y v w}- 

(40.5) shows that there is a radical difference between the estimators, since as sample 
size n —> oo, V(^) and V(m x ), variances of sample means, are of order « _1 , and so is 

\ m xJ 

the bias in fi y \ no such.effect occurs with /2 y , since v(-) and V(x) do not depend on n 


\ 

/ 


at all. In fact, it is easy to see from their definitions (40.1-2) that jl y is a consistent 
estimator, since m y >- fi y and m x >■ fi x j but that /.i y > fi x which will not in general 
be equal to jx y . The bias in jx y is studied in detail in 40.9 below. First, we see how 1 
the bias in (x y may be removed. 

40.3 From (40.4) it is clear that we only need an unbiassed estimator of c(- x] i 

to elimmate the bias in fi y . Since y/x is observed for every sample member, ■ro'can 

calculate the sample covariance of y/x and *. By the bivariate analogue of (12.109), 

it is the ^-statistic kn in the sample which is unbiassed for T<T in c. •* . . 

and thus * r 1 unDlassed tor ^11 in the finite population, 

and thus the unbiassed estimator of the covariance in the population is 




__ N-l n , 

Thus, from (40.2) and (40.4), an unbiassed estimator of ^ is 


(40.6) 

(407) ) 


first proposed by Hartley and Ross ( 1954 ) If -V- In. 

' '* TTJ ls ne gligible, it reduces to 

^ (40.8) 
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3:1,. i. —■ »• ■— ». *.««jrsvins 

«,,«»>»« we n0W need ,0 estlmate 18 C W*' m V in (40 - 3) ’ and a sin 8' e sample supplies 
only <® e value of and °5 However > a simple approximation is easily obtained. 
When « = i. is identical with C(2,*). Moreover, the covariance be- 

tween sample means of jointly distributed variables is inversely proportional to sample 
size (this follows, e.g., from Ru e 10 for ^-statistics in 12.14, or may easily be proved 
directly), and the same will hold approximately here, where we seek the covariance 
0 f one mean and the ratio of another to the first mean. Thus 

c fe^H c (B 

and using (40.6) we find from (40.3) the approximately unbiassed estimator 

# = 


fn N _ 1 1 


(40.9) 


m x n—L 

a result differently obtained by Nieto de Pascual (1961). The absence of the factor n in 
the second term of (40.9), compared with (40.7), again illustrates the different orders of 
magnitude of the biases in (40.1-2). 

40.5 We now have to examine the variances of the alternative modified ratio 
estimators (40.7) and (40.9) as a guide to choosing between them in different circum¬ 
stances. 

We consider only the case when N —> oo, so that sampling is effectively simple 
random. Using (40.6), we rewrite (40.7) as 

jl' y = [t x m y/x +k u , (40.10) 

where k n is the k -statistic of the variables y/x and x. Thus 

V(fLy) = $ V(m y/:l ) + 2[x x C(m y/X , k n ) + F(&n), (40.11) 

which in the notation of 13.2 is written 


Now by 12.14, 


VW = ^F(m ?// ,) + 2^/c(j 1 1 ) + Kn j). 


1 l\ _ *21 _ 


01 / n n 


using (3.80); while (13.7) and (3.81) give 

/l l\ _ 1^22 , ^ 201^02 _ ( W “ 

\ 11 / n n(n — 1 ) n(n- 1 ) 


(40.12) 


(40.13) 


(40.14) 
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where the cumulants and this notation as 

m = K^ n+2f, ^ J *i ' * 1 b defin i t io„ 

(40.15) may be usefully stmphfi^ 1 Ul-, J 

= /^22 /^ll 

. nf ran ,5) which is thus equivalent to 
ln the ncta ton ’ F /^- ft/I )(*-ft)}+^ (/ ' ao/ ' 02+/ '‘ l) - (4 °' 16 ) 

n V(fty) = /% -“ 20 + 2 /% /*» + K ) J 

Now consider the identity 

.. ' ” 

II * nl. *. “|“i“ w“m J *2, Wto *“ » *• 

«F(#) = ^ -/v* *)+^ 20 ^02+/^?i)’ 

and returning to our original notation, this is 

*FW) = F(y)+A F(*) - 2 ft/ « CO, *)+T,{^) + C2 (? *)} (40 ' 18 > 

, r es U lt obtained by Goodman and Hartley (1958). As « -> co, the term in braces in 
40.18) may be neglected and 

n V(ju.'y) ~ V(y) + fiy/x V(x) — 2(i y / x C(y, x). (40.19) 

40.18) is most easily estimated by expressing its form (40.12) in terms of cumulants 
nd using ^-statistics to estimate them—Goodman and Hartley (1958) give computing 

ormulae. 

Robson (1957) generalized (40.18), and also its unbiassed estimator, to take account 
f the finiteness of the population. 

I 

40.6 We may similarly obtain the variance of (40.9), which we rewrite analogously 
> (40.10) as 

(40.20) v i 


m v k n 

Pv = l*x~ +—-• 

m x n 


We see that 









SO 


that 
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- - iid in derivins by ' ormation « 

4 

W n *\^%< k uj + n^u)} 

’ (4Q9 

isff .;sL'r;“","i;;£ a«|) - - * 

approximation may therefore be written 1 term on the ri ght. Our 

K » - ^©M^- 

Since the bias in the unadjusted estimator ji v was seen in 40.2 to be of order n-i the 
bias in ft, IS Of no greater order, and its square will be of no greater oX than’n . 

r£“7"» 

£{(&-ft,) 2 } = ^ F (^){ 1 + 0 (;;)}- (40.23) 

v A more precise approximation is given by Nieto de Pascual (1961). The leading term 
'■ in (40.22) and (40.23) is the variance of the unmodified estimator fi yy and is easily 
f evaluated to order n~ l by using (10.17), which here gives 

/„, \ 1 ..2 rr//„.\ rrr..\ orv.. 

(40.24) 


y(<\ = 1 A(V(y) + V(x) %C(y, xj \ 

\m x ) n f4[ i4 nl jx yf x x )' 


Thus (40.23) becomes 

nE{(p.' u - h Y}~ V(y)A V(x)-2%C(y,x). (40.25) 

rx rx 

(40.25) may be estimated with slight bias by replacing \lJ\k x by m v /m x and the variances 
and covariance by their unbiassed estimators. This gives an estimator of the mean- 
square error 

£{(&-!**)*} = n (n-\) 


At*-?* 


40.7 We now compare (40.19) and (40.25). Following Goodman and Hartley 
(1958), the difference may be written 

(40.26) makes it clear that the modified ratio " " 
efficient than the modified mean of ratios estimator ft, according 
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;*£’ SS ifes^'as: ;« * - »-«: 

more common, and the 8 estimator m„ if the run, 

biassed (40.7). (han the ordinary sample m gh t . 

m ’‘ffiSmZu* than its first term, >.e. 
hand side of (40.25) (40.27) 

P‘' x> 2fa 

• ( i' compared to both [x y and m y , in ter^ 

We have thus characterized ft* and the population ratio 

„°f meansX D,a an“ | «sentially estimates by the sample ratio of means *,/»„ 

this is as we should expect. , ,7 to the case where x is 9 

Olkin (1958) generalises ,he theory of the unmodified ft 

vector. 

40 8 The approximately unbiassed estimator (40.9) was obtained by directly 
40.8 the approximatey could alternatively, have reduced the 

estimating the bias in fx y given by (4U.5). > , , in 17 10 TK' 

order of magnitude of the bias by using Quenouille s method, described in . . Tbs 

would involve the calculation of jx y for each of the n different samples o size (n 1 ) 
which exclude a single observation, averaging these n values, and using (17.10) to 
obtain a modified estimator with bias of order n~ 2 and variance unaffected to order n ~ 1 
(cf. Exercise 17.18), as was seen to be the case for £L y in (40.23). 

Durbin (1959 a) used a simpler form of Quenouille’s method to modify a general 

type of ratio estimator of form r = ^ (which includes (40.1) as a particular case), whose 

- 4 

bias in estimating E(t y )/E(t x ) is assumed to be of order nr 1 . If the same statistic r 
is calculated for the first \n and second \n observations (n even) and denoted by r,, r 
respectively, the modified estimator is 

t(r) = 2r-j(r 1 + r 2 ). (40.28) 

If the regression of t y on t x is linear with constant variance of order n ~ 1 and t it« P lf 
is normally distributed with variance of order «-i f40 9 Q\ , 1 > na t x itself 

order and variance which agrees to order"- ^i.h that 0 TH ^ biaS ° f 

tically when terms of order n ~ 2 and i> . at 1 ^ llt 18 m aller asympto- 

holds when , has a Gamma distribull" 3re ta ' Ien ^ aCC ° Unt ' A ^ ^ 

bias and mean-sqmre eroT ' ha ‘ "" ° f Quenouill(i ’s original method gives even smaller 
This result is more general fhon + c 

assumption th* f S 1S c ^ rta ^ n ^y true of m and m ’ or< ^ er 11 1 by the Central 

t0 be lost by bias-eP S ? tlsf ! ed - follows that in”* u - ^~T and tJle Iin ear regression 
uumation using (40.28). SUCh a Sltuati °n there is nothing 


Ji 


\ 
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. , 1 ^tMENTj 

simple numerical example with „ _ „ 
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Popular Miues oj ( T 

E -r 0 ) 

mean-square error 


* 


1 ) 

P'y 

K 

fly 

fly 

m v 


(40.28), (40.1) 
(40.9) 
(40.7) 
(40.1) 

. (40.2) 

(sample mean) 


v&ampie mean) 

Exercise 40.17 asks the reader to verify lhcsc value$ 


0-38 

0-44 

0-56 

0-92 

2-41 

2-67 


(40.29) 



40.9 If we write (40.1) in the form 


N _ 

Px 




(40.30) 


and expand the negative binomial into a Taylor series v*\ia • u 
N,n-> °o> we find on taking expectations ’ lld Wlt h probability 1 


as 


W) nX \n wArf 7^r) + 0 ( 


n 

M 2 > 


(40.31) 


where M stands for n or N indifferently. Thus the estimator 


1 

# 




°xi/ 




(40.32) 


-- v W ^/\«S m y m x) J 

has the first-order bias of m y /m x removed. In (40.32), the sample variance and co- 
variance are defined with (w — 1 ) as divisor, as usual. 

It is a straightforward, though somewhat tedious, matter to evaluate the mean and 
variance of u, using the results (the first three of which were given at (12.117), (12.119) 
and (12.121), the remainder being derivable by the methods of Example 13.2): 


- 20 > 


\ 


(40.33) 


E(m x ~ix x y = 

E(m x - Mx y = 3«?Ki,+0(»-»). 

Efa-pM-K*) = ocj K m 

E ( m x-Px) 2 { m u~Pv) = a 2 ^ 21 , 

E(m x -fi x Y{m v -fi y ) = a \K M K n + 0{ir% 

E(sl - K 20 ){m y - /Ay) = aj K Ui 

E{m x -p x ){s xu -K n ) = ^K 21 , 

E{s xv -K n )(m y -n y ) = *iK lt . 

Here a r = (j n~ r -N~ r ) as at (12.116), and we have dropped the suffix N to E, as 
_ 1 . Tin HQ AS) gives the results to order n , 


throughout this and the last chapter. Tin (1965) gives 


E(u) = ^^1 — ^2a a — _ ^o) “ 3a? C ao (C 2 o _ C" 11 ) J’ 


(40.34) 
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(C. + C„„ -- 2C„) + «i (2Ci ■- «C- Ca + Cf, + C„ C os ) 


+ ^(C 30 -2C 2I + C I2 )|, 


where 



In a precisely similar way 

h/v* = as 


c„ = Ks/M- 

Tin (1965) gives the results for the simple ratio estim a j 0f j* 


E 


^ |l + «j(C 20 - C n ) + (a 2 - ^ (C 21 - C 30 ) + 3af C 20 (C 20 C u )j, (4 0 ,3 6 ) 

= ^ 2 |a 1 (C 20 +C 02 -2C 11 ) + af(8C| 0 -16C 20 C 1I + 5C'f 1 + 3C 20 C 02 ) 




- 2 ( a 2 - ^) (C 30 - 2 C 2J + C 12 ) 


(40.37) 


while for t(—\ defined by (40.28), he finds 


7/7 


= ^-){ i -( c *«- c »)/ iv - 2o! "( c »>- c »») 


+ 3f-.~'lc»(C»-C u )' 


{n“ 


N 2 j 


(40.38) 


3 


V {‘&] ~ fe) a f>(^ + C 02 -2C Il)+ 2g-^ + ^)c 20 (C M -2C 1I ) 
+/l-A+J-)c* 

\n 2 nN TV 2 / 11 \« 2 zzTV 
+ (C30 ~ 2C 21 + C 12 )^. 






'H 


^N + W*) C “ C °= 


/ 


(40.39) 


These results make it clear that the bias in u and in ) is very small, with 

no term of order zz - 1 in either, as opposed to the bias in mjm x . All three variances 
have the same leading term, which we have already encountered at (40.24), where we 
saw it to be also the mean-square error of fi'/ju x defined by (40.9). The reader is 
left ,0 show in Exercise 40.13 that to the nett order of approximation, we have 


i 


V(u)<Vitl^)U V 


f m u 




<m. 


(40.40) 


ontmunitf bkA SeemS prcfcrablc t0 thc °‘h<* estimators considered here 
n grounds or bias and mean-square error. 


and n^&t^r 1 rb.rriate IrUntoftert^ionr^ ^ 


Js 


Regression estimators 

40.10 Given that we know thf> i 

as in 40.2, it is natural to consider i-h P°P U atl ° n mean [x x of a supplementary variable, 

consider the apphcation of the theory of regression to improve 


/' 





I-. 


)> 


(40.43) 


A 
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effidency in the estimation of The KY ^Formation 2 W 

• mineralization of Hit 1 \ " ~ ” !# h Kt‘ x ~m x ). ' 1Car rc ® rcss ‘°n estimator 
This is a generalization ot (40.1), to which ;♦ j ,. rt 

ever, & is usuall y c ^°sen as the LS reerew* reduces if we choose 6 = 4 °' 41 ^ 

If the linear model (19.8) holds for the relating ®" 6 " 4 of J upon H ° W ' 

of Chapter 19 and of 28.12 onwards holds- how" 1 *®^ betw «n y and * the 1 8 

1 ulation, and in any case * is a random variahl ’ We Me d ig he’re whh a 

theory Mnnot hold exactl y- Instead. We obset^tW^ a PP lic Mions, so that thTls 

with means fi„, /h, variances V(x)/ n , V(y)/ n and m ». and m x are jointly distributed 

(40.41), if we ignore sampling errors in 4, however C{y ’* )/n ■ Thus, from 

. =pw+w W -2 ic (V B ; hosen ’ 

with unbiassed estimator \y> x )> (40.42) 

■jp-/ v \ _ 1 n 

N) ~ nin^Y) t x ^yi~ m v)~K*i~m x )}\ 

The asymptotic formula below (40.25) for the 
which is also its asymptotic variance and that of t K p lmated ,^ ean - s quare error of fit' 
(40.43) by putting 4 = a3 above . '* livable from 

»x will usually be asymptotically normal, and hence the ° rem ’ m ‘ and 

irfri"*- 1 a - 

sample mean estimator m y if the right-hand side of (40 “ ore effiae nt *han the 
i.e. if 2 bC(y, x) > b 2 V(x). ( • ) ls less than its first term, 

condition is always satisfied ifSimllal*"* ” ^ this 

we see that the condition for J^le^S Z^S ^ ** (4 °^ 

V(X) { b2 ~(^)} <2C(y ’ x) { b ~f) (40.44) 

If b is the LS coefficient and tends to' C(y,x)/V(x), (40.44) reduces asymptotically to 

V(x)(b-&\ >0 (4045) 

which holds except when b = fi y l(x x when, as we have seen, (40.41) will reduce to 
(40.1) asymptotically. There is thus nothing to be lost by using the LS regression 
estimator, at least asymptotically. 


40.12 The regression estimator (40.41) is, of course, biassed. To remove this 
bias, we discuss general methods for constructing unbiassed estimators, due to Mickey 
(1959) and W. H. Williams (1961, 1962), which will also throw light upon our earlier 
problems in ratio estimation. 

Unbiassed estimation with a supplementary variable 

40.13 We begin from the observation that, for any constant &, the estimator 

m v -a{m x -fx x ) (40.46) 




THEORY of STATISTICS 

THE ADVANCED ^ if a is a statistic calcul ate , 

220 u t this will not generally . 0 £ n observations is sn i- 

will be unbiassed for /V Suopose now that t e ^ ^ ainder ” sample of 


of 


Will be unbiassed for * but noW that the —^ V sample of ^ 

from the same sam P[ e * of p observations an observations in the order of 

:^s nt (t ot 

sample is a random sample from <M #1,^ of the me an of this remainder" 
subsample, this will give us an unb in the remainder sample aid the 

population. Moreover, we can “pt“ s ‘ an(J population means and th 0Se 

remainder population in termsi ofid* o ^ (?) Thus (40.46) gives an estimator 

of the subsample, distinguis Y -pmJp) N[i,-ptn x (t)\ 

m - N-P S’ 

p n-P l n P , 

. , pap _ (*\\/(N-t>) Thus the unbiassed estimator 
which will be unbiassed for {Np y -pm v {p)}/^ P)’ An 
of p, itself is {(N-p^+pm^tWN, which we write 

^ = ^ )} (40.47) 

The choice of an integer p is arbitrary in 1 < n -1. For given p, the function a(p) 

of the subsample is also arbitrary. We therefore have a large class of unbiassed 
estimators of p y which make use of our knowledge of fi x - 

Exactly the same argument holds in the multivariate situation where x is a vector. 

40.14 An undesirable feature of the general class of estimators (40.47) is that 
they depend on the order in which the sample is drawn. We can overcome this by / 
considering t. p for every one of the n\ possible orderings of the sample and averaging 
to obtain ^—this average sometimes takes a simple form requiring little computation 
from the sample. (This averaging process is exactly the same as we carried out in 
39.11 for similar reasons, although there the results were not computationally simole 
because sampling was with unequal probabilities.) Exercise 39.30, which now simolifies 
since we are sampling with equal probabilities, shows that the averaged estimator t 
has variance which is never greater than that of any single t p . § f h 

40.15 If i n (40.47) we choose a{p) = m ( p \ = \ £ y t . 

J,x yF) p i t l r and P = 1, it reduces to 


U = u 1 n / v 

%i + n ’^ri 

i oo n 1 


where are the values of the first ob • V V 

of the sample, we obtain ^ ° bSerVatl ° n draw "- Averaging over all „! order- 


h - Hj »«,/,+AL_i n {,, 
which is identica! with (Pag. N n^V (40.48) 

Va ^ 3nd ‘ he SamC **) - above! 2 averaj H v at , “ m0re ’ if we choose any other 

age value 4 will be the same as (40.48), 
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, t itself will differ. Thus if a(t>\ — r 

a «h »f C0U T‘ P m , W ~ we «« » unbiassed 

zsr-* * - *■ («.D 

byputtm 

• id, when averaged gives a radar form with m (p)/ mAp) replaced b its a 
* b ‘* ), of course, it reduces to (40.48), since a{p) is then exactly what we had previ- 
^ i r The next simplest choice is p — n — 1, when the average value of m (•b)/m (t\ 
ousty* . 1 * (nitty—yA vKP}/ xKP) 

0V er all permutations is seen to be = *. Thus 

f AT_ / t)\ n 

K-1 = P*R + K - Rm ,) (40.49) 

is an unbiassed estimator of //,„. 

4016 Turning now to regression estimators, it is natural to investigate the choice 
a(p) = Uf)- (40.37) reduces to 

tp = m ij~byx(P){ m x~~ ftx) 

- ^ N -j^ {J^p) W ~ ~ byx W ^ ~ 


(40.50) 


\ JL / 

Averaging simply replaces b yx (p) by its average, p = 1 is now impossible, since b yx 
is then nugatory. As before, p = n— 1 is the next simplest, involving the calculation 
of the regression coefficient n times, omitting each of the observations in turn, and 
averaging to obtain E yx (n-1). (40.50) reduces to 

= m v -S m (n-\){m x -p x )-^P ( 2b„(n-\)x { -nB m (n -!)«,). t 40 - 51 ) 


* 

/ 


Nn 

N. 

where x { in the summation is the value omitted in calculating the b yx (n—1 ) which it 
multiplies. (40.51) is equivalent to the usual regression estimator (40.41) if all 
b vx (n- 1) are the same, but not in general otherwise. However, when n is large, the 
b* x (n-1 ) can vary very little, and the estimators differ correspondingly little. 

Estimation of variance 

40.17 The sampling variance of the unbiassed estimators (40.47) cannot be gener¬ 
ally investigated, since everything depends upon the choice of a{p). However, if we 
modify the estimation scheme slightly, we can at once obtain estimators whose variance 

can be estimated. , 

Suppose that the n observations are split into k subsamples as they are drawn, 

the rth subsample containing n r observations, n r = n - W e write the partial sum 

U = n+g, so that „ +1 = «, and «« = n. We re-label the estimator (40.47) as 

Kp>«) to signify that a subsample of p is used in a sample of size ». . 

sequence of (fe-1) estimators f(« +1 ,» +2 ), *(««,««) • • • '(* + <»-n,»«). which each 
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, nlete sample of the previous es tations ^ ay 

into the £ subsamples, and for r<s 

E{t(n+r>n+(r+i))> K n +» * +(#+1 E '' Et(n +S , #+(,+d)} 

_ E .. • E {t{n+ri n +(r+ l » r % * £ 

= E...E {t(n^,n+(r+i))M»} = ^ > 

i r+1 . q 19 to estimate the variance of the mean 0 f 

- -1 - “5 —"S- «■*>“ m 

. tW the k subsamples be of equal size. \y e 
40.18 In 40.17 we did not requir e ach q{ the su b sa mples i n 

now suppose that they are, so that * = n/k. U we n 

. , /A v Jn\ : n /40 47) and calculate « r» «] each time 

turn to evaluate a particular «(p) = in anU _ \k / 

its £ values will no longer be uncorrelated as in 40.17. Their mean is 

= s {“(%)- (f) -w 4 (40 - 52 > 

where a bar denotes averaging over the k values obtained. The first two terms on the 

right of (40.52) are precisely of the form (40.46), but are not unbiassed because d (-\ 

\kj 

is calculated from the same sample as m x \ their expectation is [ji y — C\ d( -) m I 

l W’ j‘ 

The last term in (40.52) is evidently an unbiassed estimator of this covariance if the 
population is regarded as consisting of groups of size n/k, of which k are selected 
at random for the sample. 

Modification of sampling scheme to eliminate bias 

ohapto'Ubercoteraed'tte fom ^ ** 1 

reduce bias in equal-probabilities „ m Ii; g -.u ° t " e est,mat °i' to eliminate or 

at'rTdere h d e Itatr “ ‘° Ch “ ge ** SampHn g ^heme^ttothe oH^inS esuWo^ | 

; ■f* ■“ ta ~" - k «-k * ? - A 

*<■ Su PP°^, then, that we choose * f 

n i ~ nx. t 


1 


w 

V 


1=1 


*<» 


(40.53) 
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biassed. 0» the other hand . > f «e regard the set of (N\ ( ^ “ exac % ™- 

„ ou lation from which one member is to be draw,, W P ° SSlble Samples 38 *e 
?/ 0 f these samples, the same argument shows titaUf "etj ‘T ° f the ««» 
with , vi we make the single selection 


V 


, x /© 

“* = ( "V ,?,«=(».)< 



(40.54) 


m. 


X 


T ~ ^ m y/m x , 


the estimator (39.20) becomes 

t 

“ Tte v“ ’ 1 of ( r s ) e b e e Za?„:r»d “d ZSZtnt * ^ (1?51 > 

as usual from (39.23^). In the case of of course, a, least ‘twoXwot" 
subsamples of one sample) are necessary for variance estimation to be possible 
Nanjamma et al. (1959) discuss the general problem of modifying the sampling 
scheme to render ratio estimators unbiassed, with applications to several types of survey 
design. See also Pathak (1964a). 3 

Stratified and multi-stage sampling 

40.20 Any ratio or regression estimator may be applied separately within each 
of a number of strata, provided that the population mean of x is known within each 
stratum. Alternatively, a single ratio or regression estimator may be applied using 
the combined results from all strata. We should expect the former procedure to be 
the more efficient in general. The details are given by Cochran (1963) for biassed 
ratio and regression estimators. 

Unbiassed stratified ratio estimators are discussed by Nieto de Pascual (1961) 
and W. H. Williams (1961) in the univariate case and by Olkin (1958) for multivariate 
situations. Robson and Vithayasai (1961) consider a stratification-like situation where 
y and x can be expressed as the sum of k corresponding components Kish and Hess 
(1959) derive asymptotic formulae for the variance of the biassed combined ratio 

estimator in stratified multi-stage sampling. 

40.21 Durbin (1953) pointed out that since, from (40.30), 

m u _ — + o(« -5 ), 

m x Mx Px v 

the ratio of sample means is asymptotically linear in y-WM 

= 1 S Zi/rtr 
m x Mb niml 


(40.55) 

= z, so that we have 

(40.56) 
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I. follows that for the es.imat.on o [mj ru le 0 f 39.46 and 39.49 . 

3 , 45-50 applies «a -parat^whh.n each stratum of > 

nudti- 8 tage ^design^ 1 ^ 6 ' same applies for regress 1011 

Two-phase sampling stratified sampling (39.15-27), and the use of ar) 

40.22 Our discussions o . efficiency in this chapter, bot presup posed 

auxiliary variable to improve estunati ^ unbiassed estimation possible. ^ 

some knowledge of the P°P uIatlon ^ be known , an d in the former the relative si 2es of 
the latter case, it was fx which had estimator (39.38). If this essential 

the Strata, If/If, which are sometim es suggests itself on practic,' 

information is not availa , P . oro b a bilities random sample to obtain it 

grounds is to carry out a prehmmary ^P™ bab “" ; „ al purpose of estimating th 
and follow this by the main sample devoted to the ongma P J , acceptable if 
population mean. Clearly, such a procedure mU onfy be .con y jp le * 

the cost of the preliminary sample is small elauye to the gam 1 j W 

the main sample as a result-we make this point more precise later. 

A sampling P scheme of this kind is called tm-phase sampling. (We do not use the 
older name double sampling, which has already been used .n Chapter 34 or equential 
method whose aim is to achieve a confidence interval of prescribed length and coefficient. 
This is certainly not our purpose here, where we aim primarily, to improve efficiency 
of estimation at the second phase by collecting auxiliary information at the first phase.) 
Two-phase sampling is distinguished from two-stage sampling by the fact that it uses 
the same sampling units at each phase of sampling. 

40.23 Following Neyman (1938), who first solved the problem, we consider the 
stratification problem first. We wish to stratify into a fixed number k of defined strata 
but are ignorant of the population proportions N t /N = W l in these strata and therefore 
cannot use the estimator (39.38). Accordingly we take a preliminary equal-probabilities 
sample of size n 1} which is found to be distributed over the k desired strata with fre¬ 
quencies n u , n u , . . . , n Vc , where ^ n u = n v The proportions zu ll = n ll /n 1 are, of 

course unbiassed estimators of the population proportions W h and it is therefore 
natural to use as our estimator of ^ 

it 

** 12 = (40.57) 

‘" *• ““"1 (~> .f - 

be a subsample of n u or whether it is tn h * ^ questlon now aris es whether n 2l is to 

the former is much more likely since if lnde P e ndently selected. In practice, 

can be available and second-phase samnli ^ Un ... n ° Wn no complete listing of the strata 

CaC stratum obtained at the first phase , e ^ ase< ^ °n the random sample in 

Phase. Furthermore, although the first phase is 
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lf We assume that the » are certain to be'so . ° Ut ^"'‘aneouslv 

the *«, that the observed values of the relation torn . 

in advance. u are no restri c , ion U p 0 hr? lded val »ea 

P ° n the fi«ng „f 

40.24 We now (cf. 39.35) use the symbol £ t0 den 

second phase, conditional upon the first-phase Cultr^ eXpeCtations at the 

expectations at the first phase. Using f 3 Q lo\ ., P fixed, and £ to denote 

so that /},i is unbiassed. Its variance, from ( 39 . 73 ) ^ ’ (40 ' 58 ) 

F( « = f{^ 1 , ) > + U { Ej 1!)} . 

o»..,™ -». ^ ^ > « wj ’; 

and hence * Un Y 

£Wi 2 )} = S£K)- 


1 _ M 2 1 


W/ 


(40.60) 


# If we now assume each N, to be very large, the last factor on the right of (40 60) is 
negligible. What is more, the first-phase sampling is now effectively multinomial 
estimation of proportions, so that (5.80) applies and (40.60) becomes 


E {V^)} = 2 j»j(! zEl)+ wj[ 4' 

12 I { «1 j n u 

The second term on the right of (40.59) is, using (10.16) and (5.80), 

v {E{M} = V{Z V>1 iPi} = 2 $ V H,)+22 [t lf x C (w lt ,w l3 ,) 

12 11 11 l p i 1 

l*P 


(40.61) 


= Z,?FJM-22 W 

l fl\ l V 

l*P 


l*p 

WjWy 

«i 


= I{S^-(S^) 2 }. 

Hi l l 

Putting (40.61-2) into (40.59), we find, since 2 = ft, 


(40.62) 


(40.63) 


It will be recalled from (39.46) and 39;j 9 “ nttmtified staple 

expresses the gain in precision of a U or strati e v 
when all stratum sizes are large and n x is samp e size. 


226 


OF STATISTICS 

THE ADVANCED I HE known , (40.63) reduces to th 

ss -sSih si ■*i 


Wl 

(40.63) may be approximated by ^_ s ^ a j 

.... , ^ ___X_- + £ »i —■ 

V(flu) -r ' n , I nu 


An almost unbiassed estimator 
a 2 , and sf for of. 


(40.64) 

of (40.64)?S obtained by substituting w u for W„ s > for 


40 


.26 Suppose now that the cost function 


for the two-phase sample is 


C = c 0 + n 1 c 1 + n<n c %i' (40.65) 

(40 64-5) arc of the form (39.50-1). It follows from (39.53) that the sample s i2es 

, • i • • ■ t 7 (. A . \ fnr fiv^d C (or vice versa) are 

which minimize V(/-i n ) tor nxeci u t ul / 

0 - 2-2 


ni oc- 

Wftf 


nli ^ 


c u 


(40.66) 


the constant of proportionality being obtained by (40.65) or (40.64), whichever is fixed. 

(40.66) shows that at the second phase, observations should be distributed between 
the strata just as in ordinary stratified sampling allocation at (39.55) (though it must 
be remembered that only the neglect of a term of order 1 has produced this simple 
result). The first-phase sample size is directly proportional to the numerator of the 
first term on the right of (40.64) (which is the excess variance resulting from the need 
to estimate the W t \ at the first phase) and inversely proportional to the cost of sampling, 
both considerations in accord with intuition. 

40.27 Although the intention of our two-phase sampling is to improve estimation 
efficiency by use of stratification, we recall from 39.18 that even when the W are 

kn Tin r f iSel £ the ^ St stratification ma y cause a loss of efficiency, though we saw 
m m9 that this could not happen if all the N) were large enough. However the 

addmonal component of sampling variance due to the estimation of the W at the 
first-phase samplmg now opens the possibility of a loss of efficiency even for large N. 

observations f^fing^imo^th^/th's^mtuin^'lfl^r "““‘T Si ' mplC ° f with 

cost of the sample will be the same r ■>«• tan reasonable to assume that the overhead 
» the /th stratum will also remam unch^ed i’ £ ‘SufthTc^ 

with the’expected 0 mst rand ° m with ex P«ation nW„ so that we must work 


0 




) 


E ( c n) = Ct+nXW lCl . 


( 40 . 67 ) 
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n = 


n i c i+Zn 2 l c n 


~t W, 


i c 


2 1 


>n d thus the variance of the unstratified estimator will be (for large N) 

-9. V ITT 


a 2 SJFc 


V(m R ) = L = 
n 


J C 2J 


HA, ■ < 40 - 68 ) 

1r 

the ratio of two-phase stratified to unstratified sampling variance is, from (40.64) 
P(AH2) = V n x nj v u w 

v(m R ) 


a 2 2IF,c, 


( 40 - 69 > 

The numerator of (40.69) is the product F(/2 12 )(C—c 0 ), which is minimized when n x 
and n 2 i are chosen to satisfy (40.66). By (39.52), this minimum value is given by 

min V(fi 12 ) _ [{(<r*-S an7m 

^tW lCn * K ' ’ 

This seems to be the most useful form for the ratio of variances. If we again consider 
the numerator (40.70), we see by the Cauchy inequality that it is no greater than 

{(o 2 -S W l af )+S Wrf] fo+S W t c tl } 

so that (40.70) gives 

mi n F(/a 12 ) < i 


n%) 


s W lCn 


(40.71) 


Thus if , - 0 two-phase stratified sampling with MV allocation of sample sizes is 

never worse thn unstratified sampling withSfvety mdin^stratifiei 
we can estimate the W, accurately at zero cost, so this is ettec y r con . 

sampling.' We have thus verified the conclusion of 39.19 with 

sideration of variable costs in the different stra a ' , | t0 be more efficient, 

If *>0 in (40.70), it is possible for the "^7^ is sraal l compared to 
but (40.70) is evidently an increasing unc n f (40.71) can exceed unity by very 

the weighted average L W t c 2l , the ng t0 be i os t by properly allocated two- 

little, so that there is, at worse, htt e e ^ num erical example, put c x » 

phase stratified sampling. As a simp e un ^ r /^.q 70 ) is then (2 + 6) 2 /(10 x6) - 
- 6, all /, o’ = 10, of - 6, all /. Thejriueof 40.70 = ^ 

If instead c 2J = 9, (40.70) becomes (1 + 7 A 


21 . • 4 to estimate the mean of an 

40.28 When the first phase of sa ^ g Jimator the seeond (3 ^ 3 ^ptort 
auxiliary variable, fi x , for a ratio or regr ^ ^4 We shall consi er y ^ 

to evaluate the sampling variance, as in j n tw0 -phase samp in 

case, using the biassed ratio estimator &/ m f f 


fin 
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inn - 

, * „Wp<; If the two phases are md P h 

Where we now use superscripts ^denote p ^ P e H 


where we now use supei S ^~ - . 40 9 

EqUa, -Pr 0babil “^^} -f W«' + 0( -- ,)1 


= ^+0(»- 1 ). 


(39.73) gives 


F(^ 2 ) = E {V(fin)} + V{E(p>n)} 

= E{(mfY V(mf/mf)}+V {n$p u /p x )> 

1 2 

where we neglect terms of relative order nf. Thus 

PW? r«/«?){fK , )+/‘3+^) K(<) ’ 

and using (40.24), this becomes 



( 4 0.7 2 | 




"MS 


F(*) 


ft% 


*i 


(40.73) 


The term in 1> 2 on the right of (40.73) is simply (40.25) applied to the second-ph ase 
sample. As in (40.63), the first-phase sampling introduces a term in l/n x inflating 
this contribution, as well as a new contribution of order 1 fn x . Since we have already 
neglected terms of relative order l/n 2 , we also (since we assume n x > n 2 ) neglect the 
term in \/n x n 2 , obtaining the approximation 

Ffe) = 1 (40.74) 

If, instead of being independent, the second-phase sample is a subsample of the first 
'40.72) is modified, n v /(i x there being replaced by so that the second term 

>n its right-hand side becomes simply V(mj / 1) ) = V(y)/n x . If n x is very much larger 

han n 2 , the first term on the right of (40.72) has the same value as previously to our 
rder of approximation, but if n 2 /n x is appreciable, the approximation is improved 

y a correction (l applied to the first term in (40.74), which then becomes 

meftnf S° chrm « % 3) and Yates (1960) give details of application of two-phase 
obtain usefuI^S^MformX 0 ”’ * ‘ ° Ugh reStnCt,ve as ^™P‘ions are necessary to 

or generalize ! naturally to ««*»**«* sampling, but little theoretical 

S P Gh^h H961ai n °? °" thiS m ° re ^<=ral procedure, 

the first phase is to fom clustofor th7 ^ U ',°' P k haSe samp,in 8 where the object of 

Raj (1964) discussesThJ Z 

of selection to be used at the second ^^ aSC 1S USec ^ to ^ eterm ^ ne probabilities 
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gf study 

no^** 5 °Tn our discussions of ratio estimation, we have allowed the auxilia™ • ui 
P 4° 3 ° ate general. In practice, one of the most important situationTu 
f, 10 b ! is a O' 1 variable which counts whether the corresponding value of y il ‘not 

rfiir p population - *-« ,he -*<£ 

int °.Population is sampled, but we are from the outset only interested in part of it; 
(a) ^ ^ t u e human population aged 21 and over is sampled, but we are interested only 
°’ g the ages 21-65. Here the sample size n for the population of interest is clearly 
a variable, and the sample mean for any variable measured in this population 


e. 
in 

a ran 


• of the form 2 yd % where x i is 0 or 1 as above. If the population mean of a: 

proportion of the population aged 21-65) is known, all our foregoing theory 
1 be applied- 

S e are interested in the entire population from which we sample, but only part 
0 3 ) ^ selected sample yields observations, owing to non-response (in human popula- 

°. especially), loss of records, or incomplete fieldwork. Again, n is a random 
friable, and the remarks under (a) apply. W 

V \fj obtain observations from the whole sample taken from the population of interest, 

( c ) e we w j s h to evaluate the results for sub-groups of this population; e.g. we have 
ole from a human population, and wish to calculate certain statistics for men 
1 1 and for women only. If we had stratified the sample in advance into men and 
women no new point would have arisen, since sample sizes for men and for women 
W ° Id be fixed. However, such stratification is not usually possible, so that these 
W ample sizes are random variables (though their sum is not, in this simple case). 
More generally, the sample size in any unpredesignated sub-group must be a random 

variable. 

40 31 The sub-group of interest is called a domain (of study). Of course, a stratum 

mav itself be a domain, but no new theory is then required. We shall use domain to 

mean a sub-group whose sample frequency is a random variable, whatever the reason. 

Domains frequently cut across the strata and the various stage-units of a sample 

anditTs here that new points arise. Yates (1960) gives (as also in writer editions of 

his book) a number of formulae for domains cutting across strata, for which Cochran 
ms dook.; d nuu . r>urhin (1958^ treats these and some multi-stage 

(1963) gives some of ^ ^0 derlves^ome of Yates' results for covariances of domain 
“„d 5ST8Srtsl. Our treatment follows Durbin (19S8). 

^ 1“ Chapter 39, that wehave St 

N„ l = 1,2,... ,k, and SiV, = N, while in the sample in _ 

Jr __ " 

• nf non-response, since non-response may be 

^ There is a complication here in the case cannot provide an unbiassed estimatoi 

correlated with the value of y, so that the responding group canno P 

°f y for the population as a whole. 
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n t with 2 n t = n. Now consider a particular domain, say d. We denote the pop^ 

frequency in d in the /th stratum by N[ d] with 2 N[ d) = N (d) for the domain f re< j ** 

in the entire population, while n\ d) and 2 n[ d) = n id) are similarly defined in the ^ ^ 

Note that whereas the population stratum frequencies iVj are known, the pop^j . 
stratum domain frequencies Nf 1 will generally not be known. Of .course, unstra^c^ 
random sampling is the case k = 1 . 

We define the variable y {d) to be equal to the observed variable y for domain m 
bers, and to be zero for others. Thus 

y\f = h i0ip 


where 


by — 


We then have 


’ +1 within the domain, 
0 outside the domain. 


WO.76) 

(40.77) 


Ni 


Ml tc K N l 

N (d> = 2 h lj} N id) = 2 N< d> = 2 2 h ip 

i=i i=i i=i j =l 


h 1c m 

nf = 2 h tj , n id) = 2 nf =22 h«. 

3 = 1 1=1 1=1 j=l 

We further define the domain means -within strata, 


(40.78) 


Ni 


= S yff/Np, 

j= 1 

and the overall domain mean 

h Ni 

ju' d) = 2 2 ylf/N M) = 2 a\ d) /N id) . 

1=13=1 l 


(40.79) 


(40.80) - 


40.33 We now seek to estimate pt* at (40.80). Consider first the case where th. ’ 
sampling is with equal probabilities using a USF, say / = n/N as in Chapter 39 
The ratio estimator v ' » 


Yli 


= 2 2 y\f/ n w = 2 2 yff/E 2 k H 
l i=l l j J j l j h 


(40.81) 


is the sample analogue of It is in essence (40.1) with numerator and denominator ’ 
separately summed across strata-i.e. it is an example of the “ combined ” ra* 

:rr s 5 ; P i 2 .ur g the analogues ° f (4 °-^ » d 




E(m: d ‘) ~ jS/SS y’f 


ElX S h„ 


N, 


= /£/2 yjf 


using (40.78) and (40.80). 




W| 

N, \ 


(40.82) 




40.34 To find the variance of 


/ 


™ id \ we put 


(40.83) 
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v,y 
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: <li‘ 


r 



is asymptotically 


m kl) -n(cl) _ V V. 


^ , * V£(« w >) = 2 2: ,^ n ^ 


(40.84) 

(40.85) 


~ w If-Ij. A 

The variance on the right of (40.86) is that of T * ^ ' ( 40,86 ) 

(39.45), therefore, we have 1 ““ ° f a « a USE stratified sample. By 

V(m») ~ (JLY N ~» V ,, 

W“7 n3v r ? A '°‘ (*) 


N—n 


n(N {d) y 


1 

f Si \ 2-i 

Ni | 


2 4 - 
U=i 

J 


(40.83), (40.87) may be written 
n(N [d) ) 


N—n 


V(m (d) ) 




(40.87) 

-1 as divisor rather than N t . From 

( Ni 




i 2n 




= 2 
z 


_i=l 
r^t 


Ah 


(40.88) 


• _1 

using (40.78-9) and the fact that h% = h {j . The effect of the term h u in the first 
snmmatmn over; °n the right of (40 88 ) is to convert they,,, in the succeeding parenthesis 
to yjf by (40.76), and to leave there unchanged, since by (40.80) this is a linear 
function of the yjf and A„yjf = y\ f . This first summation may therefore be written 

| K,(yi,-i*' d r = 

= S(yif-ft‘*) , +Wf»( ft “- / »“)*, (40.89) 

j 

by the usual sum of squares identity. Putting (40.89) into (40.88), we obtain 


V{m kl) ) 


N—n 


7l(N {d> ) 


IS 


Ni 


2 (yjf - yfT + Nf l) ^1 - (/i[ d » - /i ^) 2 . (40.90) 


1 

/ 


40.35 The first term on the right of (40.90) is, if we write n/N = n (d) /N {d) to 
our order of approximation, 

2 2 (vlf-rfO 2 . (40.91) 

n {d) {N {d) ) 2 i i 

As is evident from our derivation of (40.87) from (39.45), (40.91) is exactly the‘‘ vari¬ 
ance ” we should have arrived at if the stratum domain frequencies n t la een 
(wrongly) regarded as fixed. The second term in ( 4 ®.90) t ere ore inica es e 
increase in variance attributable to variation in the n{ . This will be g 


Q 

■NtH 


. .. 
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232 „ ■ eans M ld) differ substantially, particularly if the fractions of the strat a 

frequencies within the dump. />^ ^ term on the righ , of (40.90), ^ 
If we drop the factor ^ N,J . t0 our degree of approximate 

use the approximation in (40.91), ‘ « w jH be seen to be the mistratifiedszrnpY 

identical with the first formula of.39.1^w ^ ^ ^ ^ that the var.at.on i n ^ 

variance. Thus if all the fra <* onsJ V £ 0 ' f gratification from the sampling vari a „ c ' 
ofthf^Sor^Only'attedomain bulks large within at least some of the strata » 

much of the benefit retained. Y40.90) can be derived in exactly the sattle 

An estimator of the sampling v ( reader as Exercise 40.15. 

way as (40.90) itself was-the details are left to the 

r 4 .* Jo nnt uniform the estimator (40.81) must 1^ 
40.36 If the sampling fraction is not unitor. d we t 

changed to weight the stratum contributions properly. 

r = (40 ' 92 > 

I tii j=\ 11 / I n t 

The reader is asked to show in Exercise 40.16 that this is asymptotically unbiassed 
for with variance Nt 

' • sv 2 

3 = 1 


V(r) 


1 


(N (a> ) 2 1 n 




NJ j=i \ v N, 


(40.93) 


the generalization of (40.87), which reduces on substitution for z v to 

f ('-w) 


+ N?( 1 


Nii) \ri ai -/i w ) 2 


N, 


(40.94) 


which generalizes (40.90). An estimator of (40.94) is 




-2 


Nf 


l n,(n,- 1 ) 




ni 

i 


+ W 










(m\ d) — m ul) Y 


(40.95) 


the derivation again being left to Exercise 40.16. 

Domains in multi-stage sampling 

40.37 We shall confine our attention to multi-stage sampling in which s first-stage 
stoeVof !! “., ed W,th re P! ace j ne " t from S units- Any number (including zero) of 
StW d SnTfcf Ku th r r f ter> bUt We restrict further to self- 

spending population mean ^ ^ Samp ' e mean * S the estimator of the corre- 

We wish to estimate the overall domain mean, written as before, where 

= (40.96) 
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is the total domain frequency as before, 

N«=l S...SA. s 
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the 


i=i 7“*p «•••*> = NP, 
^ 9 being 0-1 variables as at (40.77). 1 1 


(40.97) 


40 38 The estimator is the sample analogue of ^ 

m ld) = 2 S ... S vjf> 


..(d) 
rj y 


i=l j 


y$. . . yftl^ S Hi fifth 

,.re b“ = 2S...SA h _s«» 

wher e »=i j p 

As at (40-82), to the first order of approximation, 

E(m w >) ~ E ( .§ x ' S •. • S Mf... ^ j E(n {d) ) 

“/? .. .,//S S...SA.. 

4=1 J 3> / i=lj „ *'•••* 

f/7\ 


(40.98) 

(40.99) 


= 


(40.100) 


the common factor f in numerator and denominator being the overall probability of 
selection for each value y in the population. 



40.39 Just as at (40.83), we define 

Z-- = VW — h.. iM) 

** i). . . v Jv . . . v n i] . . .yr > 


(40.101) 


t 


* 


and find as at (40.84) that 


= 4r , 2 s 


M W ." " ’ • ‘ 2 *« ... v 

fl i=l j 


V 


= * .?-£(S...Xz jj ...„). 

tv S i=i j j) 


(40.102) 


Thus, proceeding immediately to the estimation of the variance of m [d \ we have 


V(m id) ) 


\E{n*X 


'?(*) 


(40.103) 


'l 


where g is the mean of the s values S ... S = * 4 , ray. Now from (39.109), 

V(*>=n«) 

and in particular, if the probabilities of selection are the same at each first-stage drawtng, 

z • 

(39.110) gives in this case, with t i = -j, 


Thus (40.103) becomes 


m = ?w - 


~ £ (** z) 


(40.104) 
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234 u resolve the sum of squares 

4. (A(\ Q7-8) we now resoi 
Just as at (40.87 °)» 

M (40.10S) 

M „/ (40.104-5) give, assuming that 

l r/ , id,, 

• f the second term on the right of (40.106) is V(« ) to out 

expectation of the . (o ^ )ef , hand side, we find 

irder of approximation. ^ (yf-»?,«“) 2 - «*»> 


' ' (7Z a ) i=l 

• • > j pnf > n ds on u {d) To our order of approximation 

2 J St"i “»U - .i.-■— 

%“) = £ M 8 - *i® ““O' (40.108) 

If there is no sampling after the first s”age, (40.108) agrees to this order of approximation 
with the result of Exercise 40.15. 

40.40 If we had (wrongly) taken the n[ d> to be fixed, we should have found for 
the estimated “variance” of (40.98), from (39.110) with t i = y (d) /?i id) , 

1 C s \ ^ 

&*?,-=! £[* T m T (40 - 109 ) 

Comparison of (40.108) and (40.109) shows that the variation in the n ( f* affects the vari¬ 
ance by replacing the average domain frequency in a first-stage unit, n (d) /s, by the 
individual first-stage unit domain frequencies, nf. The increase in variance will be 

large only if the nf' vary substantially and if they are negatively correlated with the 
which is unlikely m practice. ’ 


EXERCISES 

40.1 Show from (4017) and (40.9) that 


fix 


“ an Wrwiimtely unbiassed 




estimator of ju y . 


40,2 Sh °w that the 


product estimator 


(Cf ' M * N * Murth y and Nanjamma, 1959) 


lly - m ym x /ix X) 










Ol' a 


[log 


fOU s 


to 
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hc ratio estimator (40.1), has bias C(,„, s, OR MATlON 

. f tt and hence rh,, 

fi, = |"(N-l)tnaTOx-(N_„ ) 2 Jw/< 
d Show further that, as IV-> oo, 

j5 un bi asSC ' 

^ n L V f4 A 


fr4i- ■ 
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/^CC 


slto'V i£ C *>■ X) ct; § - 7; e *“ «■ »■«*« error is given 

( 4 0 . 25 ); ^ hile lf " n*) 2^’ % 18 m01e efficient than the simple mean m y . 

(Robson, 1957) 

in 3 inverse sampling (cf. Examples 9.13 34.1) of a population of N individuals with 
nual probabilities and with replacement, sampling continues until (r + 1 ) distinct individuals 
^ve been selected, when (n +1) observations will have been made (»>r). The last observation 

j leaving r distinct values y%, observed m times respectively, with S nj = n 
is ignore i=1 

1 T 

„ t i, at f = - £ 1 nyi is an unbiassed estimator of the population mean, and (cf. Exercise 
Show i““ «i=i 

39 12 ) that its variance is unbiassedly estimated by 


V(t) = 


£ m(yi-t) 2 . 


v 


«(«-!) i=i 

(Sampford (1962); Pathak (1964b) improves the estimator—cf. 39 . 5 .) 

q , ow that if k strata of fixed sizes Ni are formed by random subdivision of a population 
40,4 xr „d then n observations are sampled with equal probabilities without replacement 
7 <>f size JS,™ variance of the estimator (39.38) over the entire procedure exactly equals that 
flf using a U&r ’ i m stratified sample of the same size from the original population. 

of ^^forther that if any allocation but a USF is used in the strata, the variance of (39.38) 

is increased. 

40, A sampling design, with one os more stages, selects . ta f age uni. with replacement 
from a single stratum of N units, using unequal probabtbt.es f, ( *Pr - ) " e * c ra ^ 
and subsequent stages are sampler 

sk 

bilities as in the original scheme, prfp £ has t he same expectation in 

sequent stages remain unchanged. Show that «„(*) = 

. u , tA = ‘ S « in the original design, and tlrat 

the modified design as has 0 w n^\ 


n ' 

£ N\-N 


\ 

J 


/ n \ 


f»‘- JNi „„ (zp 1 ’)} due to Stages after the first . 

V i-i JL [component of V{f.W » 

+n '"N(N-V 
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236 , if t he iVi are ch° s ® n ma ximum chance of being no lc 

• an 5 show that n l * . • given trn- 

4 °‘ 6 f WofsTngle'-stage sampling’ ^ dst 40.4). ises are given by Stuart (1964), 

sampIing me: *" a,so 

who develops * g f% 62 ).) 

y N- K- Ra0 individuals from a population of 

....... random sample oi n » Qf the remaining n-n ias 

probabilities ' sivins a subsample 

<"!<>» - 
The estimator of py is 


236 


:i ent 
no H, 


to n ' - - 

, , OQmn i e an d the subsample of non-respondents 

.ere », are the means °f »“biassed, with sampling variance 

spectively. Show, usmg (39.72-3), tnar n 

* / \ ^.2 On 


«f 

= -(nimi + Mzini)) 

n 




\ / n 

f t nnrt rr 2 i«s the nooulation variance among potential non- 

where ff 2 is the population variance of y a 2 w of t h e population. Show, using 

respondents in the population, who form a proportion W, ot m P P g 

39.20 that 7(a y ) is minimized for fixed expected total cost it we choose k y 

<*{?'-W& 

km " ^ + Cl (l-?F 2 j>‘ 


(cf. Hansen and Hurwitz, 1946) 


C 7 2 = ( 7.2 ana iV — >■ UU, anu a sample 15 uu&cii Willi Hit aam^ 
with & = &3iv> then its estimation efficiency is given by 

TF 2 (1 - T^ 2 )( 1 


and N- 


V ifiy\km) 


1 -- 


Amv , 


^ 2 + ( 1 -PF 2 )/A 2 mv 


40.8 In Exercise 40.7, show that if k = 1, so that non-respondents are fully sampled, if 

_T ~ nnma mcf QC T\/TT7’ PnmJu 


ise 40.7, show that it k = l, so tnar non-respuiiuenu> cue iui iy ^ampiea, ir 
oo, and a sample is taken with the same expected total cost as the MV sample 
n its estimation efficiencv is given bv 


ii 


V(jLy\k=\) . ... _ 

Show that if W 2 = 0-6 and Aj IV <3, the efficiency with k = 1 ^94 per cent, illustrating the 
relative insensitivity of efficiency to departures from k m . 

(Durbin, 1954) 

Its variance a 2 is the same onToth^or 15 ^ W1 ^ j qU ^ Payabilities on two successive occasions, 
the two occasions. The first samole is f •* and there is correlation q between the values on 

a facdon / of , h ; first samp,elSh mea°„ TonZ'lT™ ^ ^ SeCOnd ^ K * ai " S 
and replaces the remaining n(l-f) memh^ 1 f u oc casion, m 2 on the second occasion) 

sample of the same size, with mean m" T° , first sam P le (with mean m\) by a fresh 
mean on the second occasion are nu' and W ° mdependent estimators of the population 

a two-phase tegression estimator h, Y = *“■ 0«i 

occasion values upon the first-occJfonTal'ues'in tr ed . regression coefficient 6 tI of the second¬ 
ares in the nf retained observations Show that 


5 ! 












tfl 1 


W» th 


SAMPLE SURVEY THEORY: SUPPLEMENTARY INFORMATION 

that the MV linear combination of ,i l2 and m >, u 

4 hcP ij = -^- rt i U ~/){1—(l —/v>2\ 

* a -a-/)vr x,+ “u-a 4v> } <» 

ling variance 
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sai*P l 


v(fi) = -ilrtWV" 

n U-tt-/)V. 


40.10 .!» >^e —“ ‘ hat ,hC ^ ^ — **— - 

«ca»° ns m y di = m -_ m . 

d 2 = m” —m\ 

ar e independent. Show that the MV linear combination of d x and d 2 is 



, = _/_ ^ , Cl —/xi- g) , 

{i-e(i-/)} 1 {i-e(W)} 2 ’ 


vV ith sampling variance 


V(d) = 


2o 2 (1-g) 


n {i-e(i-/)V 

(Yates (1960); Patterson (1950) treats the theory for several occasions; Vos 
(1964) gives variance formulae for simultaneous sampling in time and space.) 

in .. \ cimole random sample of size n is drawn from an infinite population consisting 
40.11 A P fractions Di of the population, the achieved sample in the fth stratum 
0 f k strata contam stratum sample sizes are mi) feed in advance of sampling, 

being ni, Zj m 7t - 

Umentarv simple random sample of size mi-m is taken independently withm each 
so a supplemen y P sample member in the initial sample is c, and the 

stratum for which ««>««• ■*. P entary sample within the (th stratum is or, while 

p r su pts " member s the fth mratum (in the initial sample) is i <«. show that 

the value » £ a s “ r P‘ achieving the intended stratified sample is 

m - j+S [Prob I ■.<«*>+** 1 

and that if the m are large enough this is approximately 


E{C) = nc+Z{ci-c'i) 
l 


mi — npi 


J 


(m-npi)G{ [npi( y_ pi) ^ 

U /4 N ,, J m-npi W+'Hjni-npiVi' 

+ [mpi{ 1 'PM : z\[ n pi (1 -pi)f J J i 

e , if r { Show that if n is increased by unity, 
where G is the standardized normal d.fr. an g 
the change in E(C) is ^ [p r0 b (m < tm) («- c '^ + c ^ pl 

\f(C)>0 approximately satisfies 

and that the least value of n for which ( 

Uci - c \)pA {ni<m} = 
i yc-Zcipi) 


reducing to 


SapiP rob ^ I<WI1 ^ ° 
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when c, = 0 (i.e. when surplus observa i 


Prob {ni<m} 


ci —ci 

1 


when all a are equal, all 4 are equal, and al ? s reIated sequential sc h en , ( 

S°wWch ,he S whok°po n pulation is sampled until every nn rs achieved. 

< t rnlnac nnel ... » 


in wnicii a 

• _ (aq 44 _ 9 ) for the expected values and 

40.12 Using (40.33), verify the expressions (40.44 V) 

statistics discussed in 


varian 


‘ce 3 


40.12 Using pw.oo;, v,-, ~~ * 

of the three statistics discussed in 

40.13 Form the differences v(^j - ^ H 

that these are positive. 


hny 

\m x > 


■ V(u) in 40 . 9 , and sho- 
(Tin, 1965) 


>w 


40.14 In 40.9, show that the estimator 


b = 


Ply 


i , Sxv ^ 

Kl m x my) 


has to order n~ 2 the expected value 


m x ( si 

(1 + “i —5 

V mij 


m = j { 1 - (c„ - c s „) - 2 a? c 2 „ (c !0 - c u )| 

and the same variance as u defined at (40.32), so that b and u are virtually equivalent. 

(Tin (1965)—the estimator is due to E. M. L. Beale.) 


40.15 Show that (40.90) may be estimated by 


V(m (a >) = 




(N-n) 


rn 


ni 


S (yjf - mfy > + »® ( j _ !!<!!) ( m « _ 2 


JV(» M ) 2 ; _i*l-1 Ly,! 

where m[ d) is the sample analogue of 

(Durbin, 1958) 

Exemisl 6 40.15 40 ' 36 ' (4 °’ 94) ’ * he generaIization of ( 4 <W0), und (40.95), generalize 


40.17 Verify the numerical values for the 


(Durbin, 1958) 

mean-square error of the six estimators in (40.29). 
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41 1 In a broad sense nearly the whole of this vol 
ana lysis; that is to say, to the analysis of systems in which etT^ raul *«iate 
U es of mote than one measured or classificatory variable T, m ? mber bears *e 
„ «e have usually managed to simplify the nrohlmJ' u ? t0 the P resent > how- 

Lple surveys) be ‘ ng concerned Primarily with the eLimatb^f for 

meter such as a mean, or (as in experimental design) arranabv °I ^ particular P 313 - 
that estimators of regression coefficients are orthogonal and 7 ariables 

individual classification effects. We must now go further and 1 , of lsola tion of 
greater generality in which the variables are mtefdependent ° f 

f ce discuss some of the distributional problems which arise TTnl ^ * chapter 

stated, the underlying distributions will be assumed to be muffivariate normTT “ 

an unfortunate feature of this branch of the subject that, in other cases, very iittle is 
known about exact distribution theory. y s 

41.2 In Chapter 15, Vol. 1, we wrote the ^-dimensional multivariate normal 
distribution in two forms: 

dF oc exp/ — / I X ) f, 

( 2j-it=i V a, j\ a k Jji.io,’ 

-oo^jq^co, j = 1, 2,..., p (41.1) 

dF = exp {-K x -^)' a ( x - (^)}n dx p (41.2) 

where Up °) are the mean and variance of the jth variable and a is a matrix inverse to 
the dispersion matrix. 

We were not, in that chapter, much concerned with sampling problems, but we 
shall now require to distinguish between parent and sample values, or between para¬ 
meters and estimators. We shall accordingly write a jk for the sample value of a jfc , 
and Cj k for the sample covariance whose parent value is y 3 - fc , so that 

7jk = Pjk a 3 a k> ( 41 - 3 ) 

C j]e = Vi s k- ( 41,4 ) 

The dispersion matrix which, in Chapter 15, we wrote as V will now be written Y , 

so that we have ex 

(41.5) 


a = Y -1 


We recall from (15.15) (Vol. 1) that the characteristic function of (41.2) is givenffiy 

pt) = ex p (- It' yt) exp (it' p). ( 41 - 6 ) 

j, ;_ 1 o n will yield a likelihood 

41.3 A sample of n values, typified by x„, . > , ithm w j|i be the 

function which is the product of n terms of typ ( • )> 
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240 • hv 5 summation over sample, and by S su mm att„ n 

sum of « terms. Denotmg by 5 summ 

variables, we find „ / Xk —/t/A __ q 

* ) 


(+1.7) 


leading to 


S — (%-'/4) = °* ^ 41,8 ) 

s,«. i,-*«—. *»r T" ” 

= % k = U%.">P- 


It is no surprise to find that the sample mean 
41.4 For the parameter oc jk we have 


is 


(+1.9) 

the ML estimator of the parent mean. 


f 


JlL - ^ a -l - bS(Xj - ft) (Xj e - ftc) - 
I a | da jk 


(41.10) 

If A tk is the co-faetor of’in M|, we find on substituting for /t the corresponding J, 

(+ 1 . 11 ) 


A ik /\a | = (**-,%) = c ik■ 

Tt 

It follows W that . 

7;* = c jk- ' (41.12) 

In particular, the sample variances are ML estimators of the parent variances, and we 

also have for the correlations 

/v _ _ S(?Cj—x?) {x k —x k ) 

P,k _ ik “ {^(av-*,) 1 ^-**)*}*’ 

This applies when all the parameters are under estimate. We shall not be concerned 
with other cases, which are of very minor practical importance, but see Examples 
18.14-15 (Vol. 2) and Exercise 18.14 for the bivariate case. 


(41.13) 


K 

i 


41.5 In setting confidence intervals for these parameters we encounter the same 
difficulties as in the univariate case, requiring distributions of the “ Student ” or f 
type. We also have a new problem, that of setting simultaneous confidence intervals 

to the components of a vector. Consider, for example, the estimation of means when 
parent dispersions are known. 

be red,,3 f Ch d P ‘ er i 5 tha ‘ ‘ he T ariabIeS “ (4U > CouId ' h y a ,inear transformation, 
be reduced to mdependent normal variables with unit variances. It follows that 

s distributed as x with p degrees of freedom. We shall show shortly (41.6) that the 

(#) * 

l . fcSfiSr‘ol VammeTe'rs f'F* 1 ‘° a ,he ° rem ' easil Y P^ved (cf. 8.9)' 

3re It t i am f d ii by substltut ‘ng in the functional rel 2 ’ * 7 ’ ® m> tben ML estimators of the fs 
It also follows that the ML estimate f 008 the ML estimators of the O’ s. 

corresponding sample statistics. ° f Partlal and multiple correlations are given by the 
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fthe origin 1 variables, except that the dUper^ 8 “ the sa me as the J- 
»| ( 41 . 5 ), that ^ matrix is a/k 

n(x - iiV /- ws > ln view 


"S-rtVifi UWS ’ ln view 

. distributed as f with p d.fr. Thus tn a • 

assertions of the type ’ § lv en probability level P we may^njake 

Prob {n(x-p.y v -w s _ . ymake 

Since we are assuming y known, this sets un a m „e j Xri = P • Ml , rs 

in t dinnens^ns* The practical interpretation of thT'esulf r" “ ** f ° rm of a <Uric 
We shall consider questions of bias in estimation in th? , qUlres delica,e handli„ E 
we are concerned with distributions. th next cha Pter. For the present 

Wishart’s distribution 

41.6 We now proceed to investigate the ioint dl ♦ -u • 
in multivariate normal variation. Suppose we have » ‘JlT" ° f “T* and dis P ers ions 
Writing for the fth observation on the /th ^* ° f * ind ™d^. 

observations as ’ we ma y arra y the matrix of 


x = (*„) = 


“Vlli ^12> 

x n> 

• • 

“Vplr X pii 


(41.16) 


The frequency distribution of the sample is then given by 

I y 1 -g Tl f yi p ^ ?t 

^ ~~ (2ji)* np eX ^* ^i=i j^ X}l ~^( X}c 1 ~ ^ji ^ n dXjf. (41.17) 

We already know from Example 11.7, and 16.25, in Vol. 1 that for p = 1, 2, the 
distribution can be split into two independent distributions, one of means and the other 
of dispersions. We prove, first of all, that the same is true of any value of p. We have 
the familiar algebraic identity 

? ? a ik{ x ji~h){ X ki-Mc) = S2 a. jk {x jl -x j )(x kl -x k ) 

+ n S a ?fc {Xj - fij) (x k -fi k ). (41.18) 

Thus the exponent in (41.17) factorizes into two components. We now have to con¬ 
sider the differential elements. It will be convenient to make on each variable an 
orthogonal Helmert transformation of the type used in Example 11.3, ^ o 

Ti = fy* 1 "**) 


y 2 = _L (^ + * 2 - 2 * 3 ) 

. 

1 (y _LV„+ . . . 

i)} { l 8 

1 ( v . v 4 - . . + x n) ~ V nX > 

y* = + 


(41.19) 



• v ' *y 

• ... 
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where the suffixes are sample labels 


*1 = 


x 2 = 


the advanced theory of statistics 

The inverse relationship is 

+ VWn-'l)} y "~ 1 + V^ yn 


72* + 76» + 


72 yi+ V6 y ‘ + 


n 


-1 


x n ~~ 


,yn -r 


1 


IJ 


■\Z{n(n-\)Y u x Y n 
= 1 and the differential element in (41.17) becomes simply 


y n 



(41.21) 


n n dy n . 

Looking now to (41.18) and remembering that * we see that th <; second 

factor is a function only of y jm (j = 1, 2,... , p). The first factor, in virtue of (41.20). 

depends only on y v . . . , y n -v . ^ . , 

Hence we may factorize off from the original density element the second term i n 

(41.18) and an associated differential element in the #’s. With an appropriate adjust, 

ment to the constant factor we then obtain for the means 

dF = irt ex P {-i»s «,*(*,A (41.22) 

\l7iy v j,k j=i ' 

The joint distribution of means is, in fact, the same as that of the original variables, 
apart from the factor in n. 

41.7 The distribution of the sample variances and covariances is thus confined to 
the (n — l)-spaces orthogonal to the sample means. Since the orthogonal transformation 
is simply a rotation of axes, it leaves distances and angles invariant. Since variances 
and covariances are functions of these alone (cf. Example 11.7 and 16.24), they too are 
invariant. The non-differential part arising from (41.18) and (41.22) may be written 

I a |h«-i) 

J = exp ( — iyn 2 j a, fc c, z .). (41.23) 


■V. 


r 


(2?r)4(«- 1 )^ CXp ^ 

Our principal problem is to evaluate the differential element in terms of the 


Write 




Then the covariance c jk is given by 


u U = 


(41.24) 


jk = ;=i > 7% * (41.25) 

freedomlAhe dl^ the sample number, not the degree of 

ireeaom of the dispersions. We require that 

one foTiTat iTth^ ° f ™ ‘akep flat spaces of « -1 dimensions, 

b T pre ;r d by p >- p *.^ we 

feed P, and P 2 , and so on W f i, ,? A’ then that of *a for feed P„ then that of Pj for 
of p i> p i, ... , P„. ' sha then multl ply all these together to find the variation 


% 

A 
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paces superposed on one another. ™agine the 

p 0P« ■■■’ P " f '2 Pm -" the , POl , rlt varies on a hyoersnh an S ,< * P.OP,, 

f'et the length of the peipendicular line from P on l ? he * e of n ~ m dimensions. 

) L 0,Pv • ' l J m "h be r w ' J h ^ n the “ contend’ (volumel b / Per ? lane det ermined 
Z p is that of the surface of a hypersphere in *$ ° f Vanatlon permissible 
f ^h is<*> P m W ' m -ith radius , m , 

— _ _m 

V {l{n~m)Y' (41.26) 

> Je^ Consider theLnsforLtbn^the P ^ ' 

I based on (41.25), y ’ r * ' ’ • ’ J -> 




tIC™,- — iS 


»mi "‘'mi ~ ^ U mk U jk , j = 1 } 2, . 

fc — 1 


The Jacobian of the transformation is given by 

M ll> %2> 

J = " * > ^ mm) _ 


d(u 


ml> 




m. 


u 


(41.27) 


lm 


U 


21) ^22) 




2 m 


2w m i, 2 m 


m2> 


. 2m 


(41.28) 


mm | 


and this is equal to 2u w where v m is the volume of the parallelotope (the m-dimensional 
parallelogram) determined by O, P u ..., P m . Thus the differential element is 

! “ 


2 %n 3= 


n at mj . 


On multiplication by (41.26) the total variation of P m is then given by 


~r\{n—m) m— m—1 m 

L m _TT Jt 

r{!(«-w)K i=i mj " 


But we have 




V. 


m 


_ 


m -1 


12 - 


= V 


and, from (41.27), I %mj I 1 u mi 

The element of variation (41.30) then becomes 

^- w) <- w " 2 n at 


,71 —m —1 


r{K»-m)} 


my 

We now multiply expressions of type (41.33) for m 1> > » p 

„,,™r L those in *„ (which is unity) and ®„ and we find 


cancel except for those in (which 


v 


From (41.32) we have 


n tevn-p-v h n 

II Y{l{n-j)} 

5=1 vl = 1^-1 = 


(41.29) 

(41.30) 

(41.31) 

(41.32) 

(41.33) 
The terms in v 

(41.34) 

(41.35) 


<*> Cf. Kendall, M. G., A Course in the Geometry 


yy of n Dimensions, p. 42. 
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244 , 23 ) together, w 

jit — U —v ^, 1 / _ i 


we then have .. - "■»«, 

K«- r ? exp ( - i« S «>* 0*) (41.; 


•36) 


dF 


w< 


m ” j)} (UA1) in Example 11.7 is the casep „ ,, 

This is Wishart’s 0followmay prefer to cou^ 
(H&j find the geometrical me o P ^ (1!W 8) ^ 


1,0.54) is a s ' m P ,e ,X” eometrical line or W u “ . , g) _ From many poi„ ts 

1 '-■•“ 1 * 

“dtoensional variation. linim portant details concerning the distrib u , 

note some minor but not unimp 


41.8 

t ion: 


Let us 


cion : • nv er all /, k and, since a j7c = K 

nr the exponent of (41.36) we are summi g = A Thus there are p terms of 

tnd W°Tof'W-"- example ' = 2 ,he eXP ° neM is 

type a,, r„ 2 _!„(*„ c„+2^ + Thus, for * = 2 th 

(b) In the differential element there are ip(f+ ] ) terms ’ "° P ' ’ 6 

«srsi.^ z svzzs gstst 

J'.'.t ;S,“' n£ ...i..,, 

c in order to obtain the marginal distributmns of others 
(d) We have defined the sample variances and covariances by dividing the appropnate 
product-sum by n. We may, if we prefer, divide by n- 1, m which case appropriate 
adjustments have to be made in (41.36). The reader should watch this point in 
consulting the literature, because usage varies. 




'm 


(41.37) 


41.9 We can now derive the characteristic function of the Wishart distribution. 
Writing a single integral sign to denote integration over the domain of c’s, we have fro- 
(41.36) 

1 1 c jUn-p- 2 ) exp (-i n z a . k Cjk ) n dc jk = k | a | 

where k is some constant. If we replace 

% by *#-2V», ot,-* by « jk -B jlt /n t j ^ k, 

the exponent under the integral sign gives us the c.f. of the c’s with 0-, for the usual 

imaginary dummy variable if*. Making the substitutions on the right k (41 17) nd 
adjusting the constant to make ^ unity for zero 9, we have ( 7) ’ ” 

c}(0) = -- _ Up-D 




“ 22 _ • • • . 


!■(«-!) 


) 


a 




'20~./n 


( 41 . 38 ) 
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sU bstitutio nal device, avoiding the problem of actual! • 

is a USCful 0 “ Whlch we ** employover,he doraain 
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$ X \ n t he bivariate case (p = 2) we have, in the 

A* 1 ✓ 1 vl 


- c ;> 


IH = i- P «, 


U$Ual n °* ation ( with unit variances), 


1 


a = 


_ _ P 

!-p 2> 


l-p 2 W 


l a l = (l-p 2 )- 1 . 


Thus (41-36) reduces to 

(\nf' 1 «r 4 ^r 4 (1 - r 2 )* ( «- 4 > 

i y r{K«-1)> r{t^=2» 


x exp ( 2 ( 1 ^ 


On using the duplication formula 


we 


find 


r{«»-i)}r{K«-2)} 


dF = 


n 


,n—1 


s” -4 s” -4 (1— r 2 )“( n-4 > 


_ n 2\T(«-l) 


(1-P 2 ) 


4^r(n-2) 


x 


exp {2(1 ^p 2 ) ^ ” 2p ™ x 52+ ^ ^ d K 

(41.39) 


which reduces to the form found in 16.26. 


Example 41.2 

Let us now consider the moments of the distribution of the covariance when p = 2. 
From (41.38) we have, putting 0 U = 0 2 2 = 0> 

1 P 018 | 


^(^ 12 ) 0C 


1-P S 


l-p 2 n 


6 


12 


1 


(41.40) 


(41.41) 


| ‘ l-p 2 n l-p 2 
Expanding and evaluating the constant from the consideration that </>(0) 1 we 

, r,.44. Jlrm)-*-* 

Ww -y n n 2 J 

\ Taking logarithms and evaluating coefficients of 0 12 we fin 

J n —1 

Kl = 


n 

n-1 
.2 


n‘ 


(1+P 2 ) 


(41.42) 

(41.43) 


k 2 = 
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*3 


*4 


44) 


In standard measure 
for finite n 




4 p HI±pT 
11 — 1 (1+p 2 ) 3 

6 l + 6p 2 +p i 


mnrnnY OF STATISTICS 
5 advanced THEORY Of 

= 2fclU/)(3+/) (41. 

n 3 

= ^fell)(l + 6 / ) 2 +/- 4 )- (41,4 5) 

the distribution tends to normality as « tends to infinity. ^ 

a o/'J , ^2\2 } 

W.46) f 

& = 3+ ^zr "c^+/> 2 7 2 * 

Thus, even for P = 0, the distribution, though symmetrical, is not normal. We h aVe 

derived these results differently in Example 13 3, VoL I. 

Wishart (1929) gave the formulae explicitly for ^<8 as tar as tnose ot the fou^ ; 

order. 

The additive property of Wishart distributions 

41.10 One property of the Wishart distribution, analogous to that of j n 0ne | 
dimension, is worth noticing. 

Suppose we have two samples, n x and n 2 in number, from the same multivariate 
distribution. If we pool them, of course, the dispersions of the total sample will 
follow the Wishart distribution for a sample of n x + n 2 . But we may also consider 
the joint distribution of the dispersions from each sample. If we form a new dispersion > 
matrix by adding corresponding dispersions, i.e. \ 

c jk ~ Cjk d - £jk (41.48) 

where the superfixes refer to the first and second sample, then the c j7 .’s are distributed f 
in Wishart’s form with n x -\- n 2 — 1 instead of n. 

This is perhaps most easily seen from the characteristic function. If we adjoin f 
the distributions of cW and c< 2 > it will be clear, as in (41.37), that the c f of c itJf ; e 
of the same form as (41.38). Seir 18 [ 

41.11 It would add a pleasant completeness to the sampling theory of dispersions 

if we could proceed from the Wishart distribution and deduce thedistHW 
particular functions of the variances and covariances hv int,„ ™ dlStnbutl0n of 
domains to eliminate unwanted variables TTnfn t f 7 < . ^f a ln & over a PP ro pnate 
cated in general. As in E^rnnle 41? « Unf ° rtunatel y thls 18 prohibitively compli- 
of sample dispersions' finding the evnl' S” 7 ? moments ar| d product-moments ! 

*- - t""“" t; —’ —" ■ 


(41-36), the integral being taken wer the varian d c SPerS d " determinant I c I • From 
•( ^'K-Dla lK.-!) | c nces and covanances, 

nW-i) ex P(-i«S y. jk c jk ) (41.49) 


V 







|J(n-p-2) 


Rep 


I 


I 


exp(-iSfec Jt )n* jJb = 2 i«*- 

] a ce * by «+ 2t “ nd divide b y I f> l‘- We then have 

J(rt-l) | c 


(41.50) 


H(n—P“ 2 )+* 


71' 


ip(p-i) 


exp (- J S ,3 ft c jfc ) n & = ?««)+* p 

l/sp/i/w;-)+(}• (41. 


V 




, r , J=A . , x 51) 

Thus' 1 ' diVide by - have on the left 

e (H ,)_ v nr(K«-;)+*} 

wPt |a|‘ (41.52) 

= ?!! n 4Vi)+o, u 

w p ‘ j=i r(|(w-;)} 1 y I * (41.53) 

From this we can determine as many moments or cumulants as we wish. Again 
the substitutional device, obviating the awkward integrations, is to be noted. 


41.13 One consequence of (41.53) is noteworthy. If we write d jk for the sums 
of products about the mean, so that d jk = nc jk , we have 

e( 


d \ l \ _ opt n v _{li n i)+^} 

v 17 i=i r W-j)} • 


(41.54) 


Now a x 2, variable with v degrees of freedom has moments about zero given by 


fa 


= 2 1 r ^ + 7 

r(W 


The right-hand side of (41.54) is the product of p such factors with v = n- 1, 
n _ 2 . . ,,n-p. Remembering that the moment of a product of independent variables 
is the product of their moments, we see that | d | /1 y | can be represented as the product 
of p independent factors, distributed as % 2 with »-l, n-2,..., n-p degrees of 
freedom. 

Example 41.3 , 

When * = 1 we have the familiar result that a sum of squares, standardized by 

division by a parent variance, is distributed as % n * ^ 

For p = 2 we find from (41.54) for the moments of \d\/\y\ 

, ra(n-i)+0 r{K«- 2 ) + *5 2 M . 

f“ = 'rf(n-i)} r{K«-2)} 

From the duplication formula for the Gamma function, 

F „ jt5T(2*). 

r(7c)r(7c+1) — __ 22x =r r i 


(41.55) 


(41.56) 


we reduce this to 


fa 


T(m-2+20 

F(n-2) 


(41.57) 


n 
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^ - .he (20* ^ ment 21 ^ '^' ! y J a s: Of “he result of Exercise 1X .9, Vol . ^ 
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Tnble with 2(«- 2 ) d * lr * 

vaf * j f the joint distribution of sart^i 

ne correlation d—^ interest in » ’ ent correlations) distributions „ 

* . .4 ^fhere is consid /non-vam s h in £ P some progress in the r^i 

^ons T t the general^ (- 26 _, W^^’^Ltion (41.36) £' 


|i(«—J»— 2) 

c I - exp 


v 

— hi 2 % 


SSisSi--- 

reduces to 

r— 11 r., 4 „ 


'e 
ar e 

^UH 

en 




. XX dcjfc* 

j<b 


(41.58) 


(41.59) 

(41.60) 


we find that the Jacobian is given by p 

J = V II sf. 

... , . t f ,1 C r as is the exponent in (41.58), and consequently terrn s 

in off, leaving us with the distribution of correlates 

ip Jr^mZrM n dr ik , (41.61) 

" j<k > 


ttjro-i) n r{K»-;)} 

j=l 

where | r j is the sample correlation determinant and the constant has been adjusted 
so that the frequency function integrates to unity. We are again hindered from making- 
progress by the boundaries of the domain of variation. In the manner of 41.12 ho\v 
ever, we can find the moments of f rj itself. They are 

v 


E\r\‘ = [ r ( K”~ !) }]*' 

[r{«»-i)+# 


,u r{a(K— j)+t} 


fimn-m 


(41.62) 


Example 41.4 
Writing L = log | 


I(«-l), m = i(j- 1) 


r W 


This was shown tn / 


5 (^)= n____ 

j=i r(v+^ 


we have from (41.62) 
r (v-m + t) 

T(v-ni) 


(41.63) 


s n n Umb 5> and (41.63) then mV We m ^y therefore T* COntin ^tion may be 

on ‘"S Cumula M-generatin g e f U n S t [ he charac ‘eristic function^?V 3S a " ima S iMr y 

g‘unction we then haye nctl °n of L. For the corre- 

t-. {,ogr «4io g r Km+ 

( ,+ 0-logr(, (41.64) 


V 

2 

3= 


% 


V 
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all the expansion HEORy 

log T ( x ) = (*-l) log *-*+£ log (23l) + _l_ j 

large *, substitution in (41.641 gives r 12 * 360 * 3+ 
t, for ’ v 1 glves u « for the term in v. 

mt 1 ^ in braces 

1 v + r 2 '' ^ m ( w+ l) f +£mf2} + o( v -3) 

^ Hoff put w = W/-!) uud sum from j = 1 t0 ? We ^ 


Thus 


Furthermore 


* * » ^ uua 

V>= + ., (> 

' ^sj + if(i>-l) s - + 0 (v-3). 

£ ( L ) = ~lK?-l)A+0(>.-n 
viri = lf>(f>-l)/rHO(r-3).’ 


we have 


(41.65) 


(41.66) 


(41.67) 

(41.68) 

(41.69) 

(41.70) 


*>i. (+U0) 

Hence, from (41.64), to the greatest power in v 

$] =£(-l)-(ii-2)!{__j_ 1 1 

J ‘-° \{v-m+ty-i (7+l)~ t= > j 

\ _ | (-1)‘- 2 (A-1)! , J 

A ? -”• where « = «y-1) 

= (- 2 r)-‘2t-r(fe-l)!^_i). (4171) 

Comparison with equation (16.4), Volume 1, shows that, with a suitable choice of 
origin, -2v log | r | is asymptotically distributed as f with \p(p- 1) degrees of free¬ 
dom. To order v 1 the origin, from (41.68), is seen to be zero. 

Thus — («—1) log | r | is asymptotically distributed as with \p(p-\) d.fr. * 
Bartlett (1951) gave a slightly more refined result, namely that 

- {n-\(2p + 1)} log | r | is f with \p{p- 1) d.fr. (41.72) 

The extra term derives from an allowance for the item of order n -1 in the mean, but 
in practice this is a refinement of minor importance. 

Example 41.5 

It is interesting to compare the results of 41 . 13 , concerning the dispersion deter- 
V ininant, with those of the previous example for the correlation determinant in the null 
* case. 

Without loss of generality, suppose that the parent variances are unity and the 
parent correlations zero. Then, from 41 . 13 , 

n v | c | = (fii?) (w4) • • • ( w 4) I r I 

is the product of p independent f variables with w-1, n-2 . . . , n-p degrees 
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,f freedom, whereas "(*" d 0 f the population, we have i„ ^ 

feedom. app li«l to the samp e 1 

Now from (27.o ) Pr 

present notation * = jjj. (41.73) 


THE 


of statistics 

ADVANCED 'll!! 5 w ith - 2 i , (i ) ~I) degrees 

is asymptotically X 


Of 


1 — 7?1(2.. • j»> 


1 ■* v l(2■ • • iVix 

_. . <■ of on # 2 > ^3> • • • > x t> and $ 

. „ is the multiple correlation By repeated applica tio '> 

diere &<s...ri 1S luc , J in t he correlation a eLC 
> the wfaCt °" ° .^LL. on re-ordering suffixes __ 




.. f . , T* the correlation 

the cofactor of r n V I re . 0 rdering suffixes, m \ = I r I f4i 

f this formula we have, o 9 J . . . {1--^W I 1 Rm) \ 1 [ ^ l74 ) 

n R 2 , (i2 • • • . . 2 Moreover, all the * s are ind e . 

£*>£»■» as the - independent (of. 27.30). 

pendent," so all the factors on the H & sample of * and q variables, is (f rom 
The distribution of U = 1'* ’ 

(27.74)) {l-Vr-* )V ^l dU . (41.75) 

dF ' |(n-?)} 

is found to be distributed approximately as % with q 1 d<eg ■ _ 

Thus —(«—1) log | r | is approximately the sum of p independent Z factors with 

p_l > p_2, . . . , 1 degrees of freedom, namely as a f wlth 2 P(P *) d -fr. This 
checks with the result we obtained in the previous example. 

The ratio | r \/R u is also equal (cf. (27.34)) to ^. 2 ... P /In R u , the corresponding 
ratio is equal to $f. 3 ...pAi> and s0 on ’ Thus 

n v \c \ = (wrf. 2 ...pi{^ 2 . 3 .. .p} • • • { ns v)' (41.76) 

The sums of squares of type nsf {k) are residuals which are all independent, and are 
based on n-p, n-p + 1, . . . , n-l degrees of freedom. Thus n v \c \ is the product 
of independent f factors with those d.fr., confirming the results of 41.13. 

Hotelling’s T 2 

A 1 ’ 1 * 5 ?r ° Ceed t0 d f rive a generalization > due t0 Hotelling, of “ Student’s ” t 
As in 41.13, we write d jk = nc jk . Let (D jk ) be the inverse of (d, k ). Define 

T’2 _ „/„ L - 

(41.77) 


As in 41.13, we write d jk 
When p = 1 we have 


p = n(n-l) 1 Jii 

j,k~ 1 J 3 L 


du = m \, D n = 1 /(,„!) 

et fflft denote the - about ^ origin e s ;;;; ,a,e case - 

m jk - djk+nx j x k . 


(41.78) 
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The determinant | m jk | may be written 

Vn 
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\ 

\ 

f 


x x Vn 


0 d u + nx\ d n +nocj x 2 

0 d 21 + nx i x 1 d 22 +nxl 


XpVn 
di P +nx x x p 
d^p-}- nx 9 x 


2 A p 


d..~, + nx 2 


(41.79) 


0 d. pl +nx p Xj dpi+tiXpXz 

qubtracting #iV w times tlle first row from the second x a/m uJL !u c 
fhe third, and so on, we find ’ 2 ^ w tlmes the first row from 

1 *iV» • . . x p Vn 
~ x Wn d n ... a 


™jk I = 


-xWn d 


l ip 


'21 


d, 


2 P 




d 


PP 


(41.80) 


— Xp^/n 

Expand by the border row and column. We find 

I m ih l = I d ik I + | d jk | 

From (41.77) it then follows that 


(41.81) 


I _ 1 


m. 


jk 


I i+rv(»-iy 


(41.82) 




- 

/ 


41.16 Consider the geometrical interpretation of this result. In the case p = 1 
the numerator and denominator of (41.82) reduce to d u and m n , that is to say to the 
squares of the distances from the sample point P l (in the n-dimensional sample space) 
to its projection on the unit vector whose direction cosines are all equal, and from P x 
to the origin 0 respectively. The ratio is the square of the sine of the angle between 
OP x and the unit vector. This was the geometrical approach which gave us 
“Student’s” distribution in Example 11.8 (Vol. 1). 

In the general case, consider the p superposed sample spaces discussed in the 
derivation of Wishart’s distribution in 41.7. From a relation similar to that of (41.32) 
we see that I m jk | is the square of the volume of a parallelotope (generalized parallelo¬ 


gram) with one corner at the origin and sides parallel to OP x , OP 2 , . . . , OP 


Further, if H is a hyperplane perpendicular to the unit vector meeting it in*O', it 


is easy to see that the projections of P x , P 2 , . . . , P p on to H, say P[, P 2 , . . . P' 


* ^ *' ' 1> • J X' - ’ - — 

are such that d jk represents sums of squares and cross-products in H referred to O’ 
as origin. Thus | d ik | is the square of the volume of a parallelotope in H. Thus 


) 


the ratio of ( 
unit vector. 


d jk j to | m jk | is the square of the cosine of the angle between H and the 


If the angle is 6 we then have 


1 + TV(«-1) 


= cos 2 0. 


(41.83) 


Now if the sample points P are distributed at random in the w-spaces, the hyper- 
P^ne which they determine is distributed at random in regard to the angle which i 
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makes with a j^^f jangle problem of distribute 

0 f 0 is then that of/distribution oi & ® v ieWp oint ’ . coefficient for we Saw 

tteis the someth a slightly multiple c “"* le between a residual variable 

«“• ^f cU0 .i: d Is the sine of * «*, variables *» • • •, 


:io n 


'lit 


of 0 is then tnai - . ibut ion OA :* nt vieWp 0 ^’ ■ coefficient rv, iui we s av 

.teisthesameasth § slightly multiple,»»*, between a residual variable 

vector. And tins, ^ with tn 0 f the ang var iables *» . • •, anj 

we may w‘ te __ L—j-. = 1 ' K ’ 

1+ distribution of R\ namely 

where W e mu, remember that m th __ ^ 


(41.85) 


dF ~ arc measured from their means 

O Is the total number of variables md the varmte v {() Hotelling’s distribution 

facing the regression equation. considering p+ 1 vanables-the 

we must increase p by unity, ^ n by unity because our variation 

unit vector being the extra one. Making these adjustments and substituting 

in WW from (4L84),we find for the distribution of T 2 


dF = 


Equivalently, we may say that 


1 {TVtzM^dfFl. 

B{\{n-p),bp) (1 + T 2 /{n- 1)} J ' \ n 


1 j 


(41.86) 


( n has an F(p,n-p ) distribution. 

p(n-l) 


(41.87) 


This may be used in the obvious way to test a hypothetical vector of means p 
by measuring x from this as origin and then using the distribution (41.87). 


0> 


* U7 r The T y be derived by usin S the substitutional device of 41.12. 
Startmg from the Wishart distribution, we note that if product-moments about the 
ngin, say c, are used instead of those about means the dktrihnt; • 

except that there is an extra degree of freedom. We then htve 


dF= {W Pn W n \ C I’dn-p-l) 

exp { -i " s Vk] n dc’ it . 

As in (41.53) we find 


(41.88) 


1^1 >-l f { J 2 («+ 1 -;)) ’ 


(41.89) 
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W e may also write (41.88) in the f orm BUTl ° N 


disB* 


ibution 


,1 Ap-D | cY An _~ P ~ 2) 
if « jp*=TR r{««-))-} ex P (- i 2 n d 


glVe “t>yo« t 


tVe recall that | c 



xi£ii 

Tfw e molllpty this by | c' |« and integrate 

Replace nbyn + 2u and divide by |^|«. w h ** ** font, on th ' 

constants, then Have, on division by “ 41 % 

n , ,, | , = 2«‘+«) » m Dy a PP ro priate 

i- 1 r ff(tHTqvri^ M(!d)+«) 

,w 1 r- l/l c' | = ! <i 1/, OT ). Pin :tll ] \ r ®»-i)T- l 4 ’- 91 ) 

E {\d\/\m\y =fi y^±iiffli;5 +a 7 e ,hen ^ 

T(|n+«) r 


(41.92) 

(41.93) 


I 


V (\ n +u)Y{\( n -^p)} 

= B{l{n-j)+u^p} 

, i r 

This is the Mth moment of 

which is uniquely determined by its moments. This then is the distribution of the 
ratio \ d\/\m |, and on substitution for T from (41.82) we arrive at (41.86). 


(41.94) 


It will be seen that the essential feature of the T 2 statistic is that it is of form 
z'V -1 a z, where z is a multinormal vector with means zero and dispersion matrix V, 
while V is the ML estimator of V, adjusted to be unbiassed. Above, we had z = x, 
V = d/{n(n— 1)}. Similarly, we may define a test statistic for the two-sample case 

... nr*. 1 . « %« 1 iO ,..d - .1 _ /— - \ XT Tfc/ 1 , 1\ 


▼ - W./ yvyr / ) - J > j - -- -r--- 

(generalizing “ Student’s ” two-sample t z test) with z = (x x -x 2 ), V = P^+^, 

where P is the unbiassed estimator of the common dispersion matrix of the two popula¬ 
tions, calculated from the pooled samples. 


41.18 So far we 

of means, dispersions, and the Student ratio of ^ independent 

there is a similar generalization of the> Fis a ^ ^ ^ chapte r, where it arises 
variances. We defer a discussion of th P ma „ rem ark here, however, 

naturally in connection with tests of hypot eses * fisher’s z, for example, are 
that exact distributions in closed form correspon t he manner of 41 . 12 . 

difficult to derive, but that moments can uaaa y b < of variance analysis, it is 

A further point of some interest is that, in .J"ts which arises for test, but the 

not the ratio of two independent dispersion 
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41.19 E«“ fM th ^cond, there » »J® ^relations), U# +3) m all. ^ j 
order higher ‘han * e J#tf -1) covar.a'tces ( 65 . Th e distrrbutronal p r „b, * , 

ZtiXS: -< 3 ; ~ 


(41.95) j 

To the same order . , n » 1 ^ 

EfecJ = i sfex***^****)' (41 ' 96 ) I 

tfcp nVht are independent. There are n(n- 1) such 
If « + t the tw0 sums °" ,he Tl =7 there are * terms such as E(x Ax - 
ifge^Sves founh-order cnmnlants, but for normal variation we see f± 

the c.f. of the *’s, „ . 

d> = exp (- i i« ^ 

i * 

ffiat 

EfXjfx Xfc a , 

gives 

E{fjk c im) — YjkYlm ~(YjmYkl~^~ YjlYkr 

n 


. ..a tya ^ma) = Yjlc Ylm + tym 7*Z + Yjl Yfo 

stitution in (41.96) then gives 


us 



,) 


ience 


cov ( c ih c lm) ~ -(yjmVlcl+yjiykm)- 

particular, if; = k = 7= „ = 1 we have the known 

O f) /-N a. 


var r 


ii 


2y| 

n 


fi 
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result 


41.6 

r notation, it 


(41.98) 


(41.99) 


being indifferent whether we write parent 

Parent or ^mple symbols, 
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Relations of type (41.98) reduce this to 



wvarr = (1—r 2 ) 2 . 

The latent roots of a dispersion matrix 


41.20 In later chapters we shall encounter several situations in which we are 


interested in the latent roots of a stochastic matrix. If (*„) is a dispersion matrix 
we shall wish to study the behaviour of the roots of the p -ic in A 

I c jk~^jk | s 1 c-AI| = 0, (41.104) 

where d jk is the Kronecker symbol, equal to zero unless; = k, in which case it is equal 
V to unity. 

We take from matrix theory the result that if c is a positive definite matrix the 
/ latent roots are all real and non-negative. Only exceptionally will the roots be equal, 
and if q of them are zero c is singular and has rank p-q. 

41.21 In point of fact (41.104) is a particular case of a rather more general form 


I c jk~^jk I — 0) 


(41.105) 


where b, c are independent dispersion matrices based on m, n observations 
We may write equivalently 


respectively. 


I u(c jk +b jk )-b jk | = 0 


(41.106) 


1 1 — II 



The complexity of the distributional problem arises from the fact that the roots in A 


or u are not algebraic functions of the dispersions. It is easier to derive sampling 
distributions of symmetric functions of the roots than of the individual roots themselves. 

41.22 We assume that the parent variation is normal with unit variances and zero 
J covariances. The joint distribution of the c jk and b jl; then has the frequency function, 
a s at (41.58), 

| c |S(n—p-2) | J [« m—y—8 ) 

) fl T{l{n-j)}T{l{m-j)} 


(41.108) 
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(+ 1 . 112 ) 


and we may choose the 1 , so t a _ 1 

Ifl is ,he ? * ? matrix of latent ^ +e) ^ ^ ^ We now ^ 

^ * ***'"* v 
We also have from (41.109)^^ ^ i , (b + c)lj 

-»— - j • tr.isa 

sothat l^blj = uJUb + c )h- 

It follows from (41.112) and (41.113) that, if % # 

JJ(b+c)l, = 0. 

Multiplying (41.111) on the left by 1' and using (41.110) we have 

lbl = u. 

From (41.110) and (41.114), 


and hence 

Likewise from (41.115) 
and hence 


l'(b + c (1 = I 

b + c = (I') -1 1 _1 . 

b = (l')-iul- 1 
c = (l')-i(I-u)!- 1 . 


(41.113) 

(41.114) 

(41.115) 

(41.116) 

(41.117) 

(41.118) 

(41.119) 


41.23 Looking back to (41.108), we see that with the transformation (41.118-19) 
the frequency function is given by ’ 


/°c 


n (l-M,) 

J=l 


I Kw-p- 2 ) r p \ o) 

r i?,“4 


nr{j(n-/)}r{f(«-,-)} x a function of 1. (41.120) 

We now consider the Jacobian of the transform^;™ t* mi 

£ “ ependent of tha * ° f f co„ti t r^":; 

b a ” d c ' ?(?+!) in til; and ^vtriatfe ini tTt* iP{P+variables in each of 
The Jacobian of b, c is the7 “/“ d * “ *> a g a m making pip +1). 

- - - w.sttsss.*. 

(41.121) 

(41.122) 


b = 

b + c = 


gug 
g'g. 
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b+ For the Jacobian we have J ' 


d { biv ^ j 2 2 ’ ( p + C )ll> ( & + C )l2» (b + c ) 22 ) _ 
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£n 


£21 


2 £ 


11 Ml 


0 


^11^12 £2l£22 gl2 U l gn 


£l2 


Mi 


£22 


0 

0 
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0 

0 
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0 

2 £n 
g 12 
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2 £2iW 2 0 

& 22 W 2 ^2X^2 

2 £ l2 Mi 0 2g 22 M 

0 2g 2i 0 

£11 £22 £21 
2 £i2 0 2g 22 



(41.123) 


~612 ^ ^22 

If «i = “a can ®'* tra f ™ lti P Ies of the bottom three rows from the top three to 
Obliterate all except the first two terms in these rows, and the determinant vanishes. 
It follows that every m 3 - u k , j > k, is a factor of the Jacobian. The product of these 
>tnrs is of degree lp{p- 1) in the m’s. There Can be Tin +ai*rv>« 4 *-v-r T ~ 1 , *_ 


^ ;- UA lllc J^Dian. 1 he product of these 

factors is of degree 2 p(p 1) in the u s. There can be no other terms involving u 
because the Jacobian can be of no higher degree. 

_f* i1 _ * , _ 1 _ ___ _ h P _ / A •1 1 nn\ 


iiUot/ J ----" —~ 

Thus, for the m’s alone we have from (41.120) 

dF 11 Jl t (Ul ~ Uk) n du ” (41 - 12+) 

where k is some constant. To evaluate it by explicit integration would be very diffi¬ 
cult. The following indirect route suffices: 

k arises from terms in the original density and the Jacobian involving p and m+n, 
* ‘ — separately. Write it then as k(p,m + n). Note that \ b\ = Ihij. If we 

w 7t in (41.1241 and integrate we have the Jth moment of I & I exc< 


but not m separately, write it tnen as n{p,m+n). iNote tnat \ o\ = 11 Uj. it we 
increase m by 2 1 in (41.124) and integrate we have the fth moment of | b \ except for a 
factor k(p,m + n + 2t). After the manner of 41.12 we can find this moment, and there 

1 , 


results 


It follows that 


k(p,m + n + 2t) _ ^ T{i(m+n-\-j) + t} 

k(p~m+n) }- 1 r{l(«+«-l-i)} 


(41.125) 


(41.126) 


k(p, tn + ri) = K(p) 1 j )}, 

where Kip) is a function of p only. To evaluate it, make the substitution in (41.124; 
of «, = L/n and let « tend to infinity. Our distribution becomes 

jp _ exp (— 2 ®j) If (®,-—%) n dVj 

„r{l(m4-n-l-)')}X(f>) 

This may be evaluated by step-by-step substitutions of the type 

V 1 = Ml 

= Wj + V 1, j>h 


(41.12’ 


Va 
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. so that the coefficient in TUlj vanishes, as we ma 

and choosing « at each stag h^ ^ <0 

the result is independent ^ T (p + \ r i) . 

2 ]j>(p rr i> r {-|(/> —j )} 


r i (a\ for the Gamma function gives us 
Use of the duplication formula (41.56) tor m 

K<p) = ffrlKp+W)}' 

Assembling all the factors, we find finally for the distribution 



_ * r{j( m+n- 1 -j)}4 m v n («,-«*) n du jt ( 4 1 . 

a remarkable form discovered in 1939 by Fisher, P. L. Hsu, S. N. Roy, Girshick, an - 
Mood, independently—cf. Mood (1951). 

The distribution of the 2’s is, of course, given by a simple substitution 

«=!/(!+A). 


41.24 In the case of (41.104), when the matrix b reduces to the identity matr* 
slightly different result is obtained. We will quote the result for the distrib ' 


IX 



(41.130) 


us consider 


„ * f, 

2 K*-y, r{^r-j)}r(i(p+i-jf (4-4) n dx„ 

where now the X } are in descending order. 

Example 41.7 

These distributions are very intractable except in simple cases T 
the case when p = 2. From (41.130) we have P L 

dF = -51 (4 4) K *~‘’ exp {-jg , + n\ 

2 ” _1 r{i(*-i)}r{i( B _2)} (4-4)^^. ( 4113 

The duplication formula (41.56) for Gamma functions reduces the f 

dF m a rfA, rfj quency fuuction t0 

If we try to integrate for 2 n ^ (41.132) 

i—™. o„ ,h,c; cs'rit *" k ""'»>■ “ g™ 

</F= ’dxdy 

and on integration for x, 4r (»- 2) 




f 


) a 


uaiaciciil iGoiiiL uuicuiicu. wc win uuuie me lcsuii ior iiic aistnbution 

the roots X in this case. The reader may care to run through the foregoing D 
and modify it where necessary to obtain this result: Pr °°^ 


(41.133) v 


















£ 
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in this case the determinant reduces to 

= A2 -«+4)a+ s ; s |(i_,. S) = 0 
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^ 2 ^ 1 ^ 2 ^ 
s 1 s z r s\-X 


. - ' - (41.135) 

.m of the roots is thus equal to the sum . , 

rfbe S a distribution with 2n~2 degrees of freedom ° mdependent variables and 
has a i 


43. 


41.25 we shall discuss the large-sample theory of latent roots in Chapter 

N on-central distributions 

41.26 Just as for univariate % 2 , f, and F (variance ratio), so there arise for study 
here non-central multivariate distributions, especially in the consideration of power- 
functions of tests based on T 2 or related statistics. As might be expected, the result¬ 
ing distributions are very cumbersome. We may note particularly 

(a) The non-central Wishart distribution, as to which see T. W. Anderson (1946) and 
V subsequent papers and his book (1958); 

(b) Non-central T 2 . Since T 2 is distributed in the F form this is effectively a non¬ 
central F —see 42.22 below. 


V 

V 

i 


41.27 In conclusion we may note some points which we shall have no space to 
develop in detail. 

(1) The distribution of latent roots (41.129) reduces for p = 1 to a Beta distribution. 
Foster and Rees (1957-8) therefore called it a generalized Beta distribution.” 
Following a method due to S. N. Roy (1945) and Pillai (1956), they tabulated 
percentiles of the largest root for p = 2, 3, 4 and 5. Pillai (1966) has improved 
the method and tabulated (1964) up to p = 7. 

(2) Wagle (1962) approached the distribution problem by sampling experiments on 
an electronic computer. The task is not a light one, but results for p = 2, 3, 4, 
for all latent roots, were successfully obtained, and calculations for higher values 

are only a matter of machine time. . 

(3) The Indian school, starting with some work by Mahalanobis (1930) on racial like¬ 
ness, has developed some interesting work based on what is known as the D 2 - 
statistic. See, for example, R. C. Bose (1936), R. C. Bose and S. N. Roy (1938), 
and many later papers by S. N. Roy. The statistic may be defined as 

Z)2 = X a jk (x ir x^){x lk -x ik ) (41.136) 

where two samples, x t and x t , are drawn from two ^-variate populations and {a jk ) 
is the inverse of the pooled dispersion matiix. 

The corresponding parameter 

A 2 = 2 (fhj “ (/hfc " /***) 

is sometimes known as Mahalanobis generalized 
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In fact, D‘ is a simple function of Hotelling’s P defined for the ,„ 0 

case as in 41.17 above, i.e. D (n 1 "^« 2 / 

M) If c c, follow a Wishart distribution based on p variables with sample 
I and there exists a lower triangular matrix V such that 

C x + c 2 = W' 

(c f. Exercise 41.16). If L is defined by 

1 Cl = VLV', 

then V and L are independently distributed and L has frequency function 

f oc I L j I -L pa-p-2). 





( 


j vy^. j j j — — | 

This result is originally due to Hsu (1939). See also Kshirsagar (1961). T , 
distribution of L is sometimes called the multivariate Beta distribution.” 

(5) A s umm ary of work on latent roots is given by A. T. James (1964). See ^ 
A. T. James (1966) and Pillai (1966). 0 


EXERCISES 


i 


41.1 If x is the mean of a p-variate sample from a normal population with mean p a , 
dispersion matrix y show that 

n (x — Po)' y —1 (x-p 0 ) 

is distributed as non-central x 2 with p degrees of freedom and non-central parameter 

wCp-p^'y-^p-p,,) 

where p 0 is a given vector. 

(R. C. Bose, 1936) 


f 


i 


J 


41.2 x x and x 2 are the sample means from two p-variate normal populations with m 
p 2 and common dispersion matrix y. If y = x x -x 2 and v = p x -p 2 show that ^ 


f ( 


th n 2 


»i +n s 


(y-v)'y-i(y-v)^ z 2 


is a confidence region for v, n x and n 2 being the respective sample numbers. 


an/y-i. ^ Sh ° W * h, “ * a " d *' Sampfe ma,rix 3 < = a « statistics for „ 


r„ tnd JV f0Ur ' Varia,e n ° rmal diS,ribUtion sh0W tha * the “Nation between the covariances 


_ PnPu+ PuPu 

((! +Pn) (1 +P 3 . 1 )}** 

(Wishart, 1929) 

relations equal toT^Deftita^ P ° pulatlon has means ft all variances equal to tr 2 and all cor- 









show 


that 
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the joint frequency of 
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s 

I 


is g* en 
vvhere 


by 


w = ^(1 -r) 


dF oc «^- 3 )^-i>(»-i)-i exp 1^..!) ^ 


l 

| 


a = <7 Mi +(P-\)p}, 

6 = o 2 (i - p ). 

Hence show that u6/(v a) is distributed in the F form with n -1 r* n, 1X Jf 
derive confidence intervals for p. l > U>-1)(»-1) d.fr., and 

(Geisser, 1964) 


11 6 c is distributed in the Wishart form in satnnl<*« „ , . . 

matrix Y- Show * at h ^ h 13 distributed with corresponding paramet^ T'yh^Vbdng'an 

arbitrary non-smgular pxp matrix. ‘ ’ ueing an 


41.7 If x is distributed N(0,y), i.e. normal with zero mean and dispersion matrix y and 
M is an orthogonal matrix, show that y = Mx is also distributed N{ 0,y). In particular if M 
is the Helmert matrix of 41.6 show that c may be represented as 


n—1 

S ylyi 

i=i 


where the y’s are independent and distributed as N(0,y). Deduce the additive property of 
i Wishart matrices of 41.10. 


r 


is 


41.8 With reference to Example 41.3 show that the frequency function of (| c |/| y | j 1 ^, 
say y, is given approximately by 


oclp(n-p) ylP(n-p)~ 1 e -ccv 


r (ip(n-p)} 


where 


« = np\\ -2n J ' 


(Hoel, 1937) 


41.9 Show that for a sample dispersion matrix c, nU| c | /| y | -1} is distributed about zero 
with variance 2 p for large samples. 


(T. W. Anderson, 1958) 


4U0 If a sample of n is chosen from a p-varia,e normal population *> 
grouped into k classes #!, . v- • ad... j.i. ....*».+».»•• •» 3>i+a j a+ 
ir aider ’ 


the function 


, Xp^, Xpi+l, 

W = 


I fW i 

1 IJ 1 . . 

, j'ff.ennt classes and equals the correlation 

where r it - 1 and rB is zero if the variates belong to diff 

, ^ 

r (] if they belong to the same class. 
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Show that the LR test of the ***%*£ 

.j)} ( K?*-■?)+*•> 



and show that 




i : 


(Wilks, i 9 3S) . 

show that if a single variate x x is indep ende 


; nt 


41.11 As a particular case of the last exercise 
of a second set * 2 , . • • . X J» then r r {{(n -p) + r ) 

ii'AW) = f{|(^iiy+ 7 }r {\(n-p)Y 

multiple correlation coefficient when the parent coeffi c j ei ^ 


is zero. 

41.12 Show 
the p variates. 


(Wilks, 1935) 


and hence find the distribution of the 

algebraically that Hotelling’s T 2 is invariant under linear transformations 0 { | 
41.13 For a pair 


of normal variates with correlation p, show that, defining v by 

«Ci 2 


v = 


j 


a i a iQ ~P 2 )’ 

we have for the frequency function of v 

(1 _p 2 \J(»-i) e yw 

^ ~ n* 2&~ 1 r{K«—ij} 1 i(v)}, l 

for «>0 and a similar expression with -v for v inside the curly brackets if v < 0 Vf. r . [ 
the Bessel function of second kind with imaginary argument. < °* Here K M 

(Wishart and Bartlett, 1933) 

Is e tL 4 by" ' q “ i0n (4U29) Wlth * - 2 ft* the distribution of , = 

dF = (j ~ Vy) n - p (Vy) m -P-idV v 

B ( m ~P, ^n—p + 1) ~ ' 

41.15 Verify equation (41.62). 

PtndwV®“’variabS"^" 0 Let **• > = 1, 2, . 

Yi = x x 

y 2 = x 2 ~ft' iyi 


^ 2, . . . , n, be inde¬ 


ed take the y- s tn , , Vv = x 

s * - wwul sh„: 


yp x p-b' n yi _ . * 

b\ k Cl -" ao gonal SO that x,' ' P-P-iyp-i 

* Show,ha, Wl = °.^A. Then A' 

where BittL . nen b]k ~ v ’- 

he tnan gul ar x'x = BB' 



= y'kXj/y'kyic. Take 


0 

& y 2 )* 


0 

o 


o 

o 


U P 2 


\ 

(yp y p )°) 


V 

\\ 

X 

A 
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-t ssszats zz 

(Bartlett, 1933. Cf. Kshirsagar, 1959) 


eacn 

other 


41.17 In the foregoing exercise, by taking determinants in the equation x'x = BB', show 
fyiy* is the ratio of two product-sum determinants. Hence show that the diagonal elements 
the inverse of x x are distnbuted as the reciprocal of a y? variable with n-p + l degrees 


that y*y* w * 
of the inverse 

0 f freedom. 


(Wijsman, 1957; see also Kshirsagar, 1959) 

41 18 Use the previous exercise to prove the result of 41.13, that | d | / | y | is the product 
. independent t factors with degrees of freedom rc-1, n-2, . . . , n-p. 

V (Wijsman, 1957) 

41.19 Verify the result (41.130). 

41.20 From (41.53) show that | c | is a biassed estimator of | y |. Show also that the bias 
not removed by dividing product-sums by n -1 instead of n to obtain covariances. 





...... ? 


^ lAPTE VrrVABlATE ANALYSIS 

jjj m ulTIV 

*- tfVPOTH^- . . the present state of knowle<j e . 

TESTS 0 itivai"> ate amlyS n' and we have seen in the previ 0l ^ 

of n>ult> var , var iation, an . corre sponding sam n i 

421 The exact **J with »° r,n d dispc«‘ onS ordinar y maximum-likeliho * 

f Xat^'°r 1 'Methods based on Bayesian p^ 
i nter th 3 ^ i theory e nractical use. . literature but lead tn 

5-* A be pt» d u u “ d nt fhave ^“ffor sample, Dempster 1966). We 

method» J fiducial arg ocC asion ( s ’ which in an y case ^ ave a certain 

bias. . 


lt v *" • :, • .: irjj’ . ; • * 

■ 


,theses 



0 babiiH lc& v | s r esuu=> , n/rr e sti> i • 

VSgteZS* *- — “■ 


51311 S T e only point 
plausibility- 


1I1C ‘"-v * I 

,r T", "j 

42T ^f^^tion (41-53) with 2-1 


as an estimator 0 f 


the parent | y I 


E\c 


= fi 

i =1 


1 —-) I r I* 




(42.1) 




(42.2) 


« i "* . 

Jz’taw'iiiS*. » SO* (»•» ™- \ » I ‘I- “pram tk. *• 

persion determinant based on n observations and [ c |„-i the similar determinant based 
on n - 1, we construct the estimator 


Est|y| =n\c\ n -(n-l)A.v\c \ n _ t ( 42 . 3 ) 

where the average on the right is taken over all the n possible determinants obtained 
by dropping one observation. We then have, to order n~\ 

= 1, ' 

and the estimator is unbiassed to order *-i Tl, -j • 

ddficulty in applying it lies in the amount nf 1 1 i dea 1S qUltC strai ghtforward; the 

Ae ^ dete —, though - computing all 

it . P e ^ 0r an electronic computer. 

Homogeneity tests F 

. 42,3 A natural gen^i,- .. 

3riSeS ^^ c 0ns ide r sa m P l es fra^^^ e ^^^®^3naly8i 8 considered in Chapter 35 

264 6 po P iqati ons and enquire whether 











* c nts 

iid er 


tesTS of hypotheses 1N multivariate analysis 

may be ldentlcal - There are, as usual, three t™ c ■. 

- three types of hypothesis to 


c ° L f the populations have the same means and disDer^mno i 

H- Tithe populations have the same dispersions but may differ’T'?]! identica1 '- 

Hr *• known that the populations have the same di P Tsio n s h ‘\ T™'- 

Hr “ have the same means. versions, the hypothesis is that 




M 


42.4 F° r testing simple hypotheses, the Neyman-Pearson lemma of 22 10 (Vol 21 
applies to multivariate distributions without change. Similarly the likelihood-ratio 
method of Chapter 24, with the same plausibility, may be adopted as a test statistic 

for composite hypotheses. _ 

One property of maximization procedures is worth noticing. If we are maximizing, 
tth 0 2 ) for variations in 6 ± and 0 2 , we may solve the simultaneous equations 

^ = 0 ^ = 0 
90 ! * 00 2 


sa' 


vfQ 


i) 


(42.4) 


It is, however, equivalent to solve df/dd 1 = 0 for 0 l5 substitute in/, and then solve 
df/ddo = 0. 




42.5 Consider, then, k multivariate normal populations with means typified by /q { 
{j = 1, 2, . . . , p; t = 1, 2, . . . , k) and dispersions by y jlt or equivalently a jt o lt p jW 
Let there be a sample of n t from the tth. population. If <Xj lt is inverse to y jlt the 
likelihood function of all samples together is 

\int ( k V 

exp <1-4 S S S 


I 


k 

n V7i „ 

f=i(2^ nt 


2 " , U jt){ X tt ftlt) 

t=l nt j, 1=1 > 


(42.5) 


If all \i s and y’s are equal, the corresponding likelihood is 

J, 1 - j, - exp / - \ S S a - t {xj- fij){x t p ; )1, 

(2 7i)^ n «j,i =i 3 J 

where 

k 

n = S n t . 


(42.6) 


(42.7) 


(=i 


In accordance with the usual procedure (Chapter 24), we estimate the parameters 
in (42.5) by ML and substitute in it to obtain the unconditioned maximum L v Like¬ 
wise for (42.6) to obtain the conditioned maximum L 0 . _ We then use the ratio l- 
Lo/L u or some monotonic function of it, as the test criterion. 

The logarithm of the likelihood (42.5) becomes the sum of k terms which, being 
independent, can be maximized separately. We find, as expected, ^ 

hi = X H> (42.9) 

*ju ~ a W' (42.10) 

fill = c iif 





■ ^ 
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^ „ ad vanced‘ theory or 

THE ADVAIn . . ds a CO nstant, f 


thE ADVAN . yields a constant, for 

Substitution in the exponential term » ^ „ I. 

rtf fnr a constant, 

Thus, except tor a 



(42 


'll) 


* 1 

£‘ = ,?,P^r' 


(42. 




Likewise front (42.6) we obtain 


1 


z ' o==r 57 rl "’ 


(42. 


i 


where 


then given by 


• for ail k sample pooled together. The test cri, eri 

is die dispersion for x“ ' > 


■13) f 


u 


4r " 


To __ n,K:» - 


in* 


In 


c n 


i 


= ?r{lffi • (42 -H), 

; mav vary from 0 to 1. The nearer to unity, the 
As " n f/;"the hypothesis that all means and all dispersions are eqila ' 

42.6 The same technique gives us tests for H, and H We quote the result! 

without proof. 

Let c jla be the 


average dispersion taken over the k populations, namely 

1 h nt _ N 

c m = - s s (W,„-%)(*!«-*t)- 


(42,15) 


Then for H lf 
For H 2 , 




K 

V ^ 


2=1 U Cjla 


(42.16) • ' 




Cjla 


in 


11 c jl I J 

We note that, as in the univariate case (cf. Exercise 24.6), 

hi = 4r,4zy 

Our test criteria thus appear as the ratios of dispersion determinants. 


(42.17) 

(42.18) 


42.7 To apply the tests we require the distributions of the criteria. In a few 
cases they can be obtained explicitly. In all cases we can obtain moments after the 
manner of 41.12. For practical purposes, however, it is enough to rely on an aooroxi- 
mauon due to Wilks, to the effect that -2 log/is distributed as /Jh dfrTal 

rf:"^r posed by the hypothesis - The pr °°" is a ~ * 

pow^fTwitld^seTOour nu ^ f0 ™ 1" Whidl * hey nat " raU y arise. Clearly any J 

(2/n)th power, in which case tte criterioTfo/ff 6 ' 1 ' ( 4^1 P a J' ticular ’ we mi g ht use ,he 
mmants and it is -n times th^ In vu /^ 2 m (^*17) becomes the ratio of deter* 

tunes the logarithm of this ratio which is distributed as *•. 
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Consider now the moments of the" LR cn,» ■ , ANALYSIS 247 

ltel0n (42 ' 14 ) testing «. We 




U n 

i * 


~hSlnS X “ + 




1 fc 




c is the dispersion of sample means about n. i ^ (42.19) 

used in 41.17 we can write the likelihood of dispersion,^ Followin 8 the 

* ?«vAlwin« I . i —j i , ^ th H fT tW ° WayS > 0ne involving 


1 


>r ( 

V 


Wee uscu ■ . — “7 “‘wuauuu oi dispersions in tv 

and die other involving | c jla \ and | CjU We then £* 

We) = n [(i') t ”JESW+r)-j)] 

x n i 

j=i T[i {n (1 + r) -;•}] J • (42.20) 

^fde* 6 f ° U0WbS reSU ‘ tS ale dUe t0 ™ kS (1932) ’ t0 Wh0m reference ma y be ”»de 

In a similar way it may be shown that 

T. _ t . . 


k r 


I 


; 

%■ 


E(U = n 


t = l 
V 


Y”Y” ft EfeOirWffi 

l\ n J l-l r{J(n,-;)} _ 


5 


x n - mtew)L 

J =1 ^[i{w(l + r) — k +1 — j}]’ 


11 f 

I 

7) 


(42.21) 

(42.22) 


E(i r H ) = n n? M 1+r )- fc +1-/ 11 r 

r{|(w-^+i-;)} rft{»(i+r)-j}y 

Note that as in Exercise 24.6 the moments of l H are the product of the moments of 
the other two Vs. This implies that l H and l H are independent when H holds which 

is what we might expect from the independence of means and dispersions. 


42.9 In passing we may remark on a possible source of confusion. In our nota- 
» tion n is the sample number, not the degrees of freedom. The form of the frequency 

distribution as written, for example, at (42.5) contains the n’s only in the preliminary 
constants. If the exponent were reducible to a quadratic form which transformed to 
a sum of p-q squares the appropriate preliminary constants would have p-q instead 
:w of p. And if the sum over sample values were equivalent to n—q instead of n values 
he we should have n—q instead of n in the constants. Whether this affects the exponents 

(i- in (42.5) and (42.6) depends on how we define the dispersions. In our usage the 

ill divisor is always n. Some writers use v t = n t -\ instead of our n t and v = n-k 

(i - * instead of our n, in defining dispersions. The reason for so doing is the one noticed 

V in Example 24.6, Vol. 2—the test is nearer to being unbiassed. 

i 

he 42.10 For p = 1 the distributions reduce, of course, to those already familiar 

t in univariate theory (cf. Exercises 24.4-6). The reader may care to verify as an exercise 
that this is so. 
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For* = 2 we find from (42.22) tor *' e 1}]r{«»-2)} 

h ~~ 'f{Un-k)}V[hn{\+r) A J i J 

r „!* rtl 56) for the Gamma function reduces this to 
Use of the duplication formula ( • V 

/“r - n M _/?-ijr{«(i+ r ) _2 } 

The moments of namely {| c, u |/| c H |} J , ^ then those of 

J .. n U-o ..n-k-2 J.. 

dF = 


( 4 2-24) 




_( I -#)*- 2 #” - * -2 


<+2.25) 


I* ’ / 

If p is even, we can use the duplication formula for the Gamma function to red Uce 
the moments of the /-criteria to products of Beta functtons“^ reVf *4 
as the product of certain independent Type I variables. But this fact not very Useful 
in vivinv evolicit closed form to the distribution functions. 


42.11 The most useful results for testing hypotheses in practice are asymptote 
expressions. Following the treatment by Box (1949), we shall develop a general 
method along the lines of Example 41.4 in the previous chapter. The method, 
fact, is applicable to a wide range of criteria depending on likelihood ratios. 
Consider a variable W with moments 


h 


E(W > ) = constant. 


vhere 


IT y- Ji 

3 = 1 


“1 1 m 


m 

n xf 1 

Li=i 


//it 

n r{^(i+i)+|.} 

3 = 1 


n F{y 3 .(l + t) + Vj } 

3 = 1 


(42.26) j 


m 

2 Xi 
3 = 1 


= 2 y,. 

3 = 1 


J = ± J = l 

our treatment x, and y j will be large, of the order of n, the 
d we may write 0(n) indifferently for O(x) or O(v). 

Now take U7 ‘ 


(42.27) 


total sample number, 


M = -2 log W 

d let us find the characteristic function of nl\/r v.,i .. ' ‘^) 

raling constant which we may later choose at m - C P “* f tWCen 0 and 1 and is 
iable in the c.f., we havTfo" “"vcmencc. Taking t as the dummy 

#*) = ^(exp itpM) = E(W~w\ 

tt / v"l "~ 2 P^ m 

n,to») nr (lj( i- Vf)+fjj 

l~~ -. (42.29) 

j?! r fe(l“2/w<) + ??i } 


= constant. 


m 

n (x/j) 


Lj=l 


V 


^ p )*i ft, (1 ~p)y j = £. 




(42.3») 


Putting now 
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TESTS OF HYPOTHESES in MULTI van 

, l . v| ULTIV ARIAT e ANALYTIC! 

ha« for the cumulant-generating function of NALYSIS 

*')= m-m 

ivhetc t 

l(t) = V< 2 *, log Xj ~ 2 y lo J » 

L ” 1 f * 8y ’J + i ?r l0 S r H(l-2iI) +ft+ f j} 

~i?r l0gr fe't ( l- 2 ff) +£(+ , j} . 

„ - ,1* -»™ * ,h, ( „ H _ < ■"> 

logT(^+ / 0 log V{2n) + (a + ft -1) log a - a ^ j; f_iy £ J+1 (ft) 

where the B’s are Bernoulli polynomials of order unity 132 A + vT^ 

on expanding (42.31) and the corresponding g(0) * ’ 0 ' then find, 

#) = -if log(1 -2ft) + f Wj{(1 _ 1} 

where 

( m /j 

-’tff+M-i &«(?<+%) 

■^'7 ) l i i (p^i)^ i=i (py-V | a- i 

We must remember that, from (42.30') B and p n ( a , 

For p = 1 we have Wl = 0(«-i). Thus we have ° r 61 * Un 6SS is sma11 

(1 ~2if) + 0(n -1 ), (42.36' 


(42.33) 


(42.34) 

(42.35) 


and hence, to this order, -2 log W is distributed as U / degrees of freed™ 
Taking the approximation one stage further, we find, since 

B 2 {x) = a^-as+J, 


“i = 


2p 


m 


s 

t=i 


which, by use of (42.30), reduces to 


$<> I o — T I ^ 


t = l 


1 


Wl 2p 


(42.37 


M fc 2 fc , I 7. o , 

— (1—p)/ + S 5LZ7?i±i 

. - i=l ^ 

we now take p such that = 0 we have 

W) = -i2-/log(l-2i/) + 0(«- 2 ). (42.35 


In general then there exists a constant p such that pW is distributed as y 2 with f deere 
of freedom to order n~\ J g 

ma ^ (j 949 ) has pushed the investigation a good deal further, but, as we have r 
* o r e . ’ t T e cru( ler approximation (42.36) is usually good enough for practical purpos< 
e aso Lawley (1956b), whose work was summarized in 24.9, Yol. 2. 


Example 42.1 

Let us find the approximations to the distribution of l m the moments of which 















Z given by (42.20). Comparison 

form with = J», % 


» . _ 0 ry of statistics 

o ADVANCED ^ ^ ^ they are of ,he re^ 


k 

in 


1 , 2 P’ 

k; Si = 


,n V*|’/,1 aibuVed as *' 2 with degrees of fre ed% 

= i(k-l)p(P +3 ' ) - j. th e likelihood (42.5) less the nu mb / 

This is, in fact, the number of P^^ed by the hypothesis, 
in (42.6) i.e. the number o p™ 8 g nd f ro m (42.37), ">i > 

For a second approxima i j 1\ 2/> 2 + 9 #+12 

0= _(l-/>)i(A-l)^+ 3 ) + ^( S ^'*/ ^ ’ 

/ * 1 1\ 2/> 2 +9^+J^ f424(i\ 

6'(^ + 3) (42 ‘ 4 °) 

T • nf this kind it should be remembered that, in our convention, 

, andirr^-iti: “ e d s r: d _r 

TOsTffinot“|iL9Tbnt makes a difference to the second term in (42.40). 
h Ihis case f, is «l-j) and „ = 1(H) and the corresponding express.on to (42.40) 


1 /» 1 1\ 2p 2 +3p — 1 U 

p y=i Vj v) 6(k-l)(p + 3) v\ 


p-k+2> 

P + 3 J 


(42.41) 


Tests of independence 

42.12 The set of tests which we proceed to develop are, with few exceptions all 
based on the foregoing ideas: the deduction of a likelihood criterion, the ascertainment 
of its moments, and the approximation to a test or something a little more refined 

We need not spend too much time on the derivation of the details, which may be left 
for verification to the student. J 

First of all we consider a test of independence. Given, as usual, a sample of n 

sn^Z pT P°P“ Iation ' and « iven a di ™ ot the variables into ? sub- 

subset is independent of theVhem^ wf h u *° ‘ the h yP othesi « that each 

4 = 2 and } l f We Sha " be P artlcu Iariy interested in the cases 

K the parent dispersion matrix y is partitioned ^ j2 components 

/Yu Y 12 ... yin 

Y 21 Y 22 

(42.42) 

Yg2 


the hypothesis under test 


V 


^Y a l 
is that 



Y jj c = 0, j ^ ft. 


(42.43) 
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r^ggTS OF HYPOTHESES IN MULTIVARIATE ANALYSIS 

^ f 0 r the likelihood ratio criterion, the alternative against (42.43) being (42.42), 
Weft * 1 , I c U n 

l H - q -, 

n i Cjj \\* 


(42.44) 


j=i 




sU al (cjs) is the sam P le matrix. We can write equivalently 

tfh ere & Ji/n _ I c 1 . 

l a ~ ~ q -• (42.45) 

n l%l 

3 = 1 


. Vivoothesis, l is independent of its denominator and 1 c }j \ is independent of 
«» der We then find from (41.53) 

|c * ' ft r{Kn-i)+im}n 

m = lj r -- 


n n n r{K«-j)+ir») 

j= 1 fc = l j = l 

! j u distributed approximately as t with 

/= + & Mfi +1)} 

3 = 1 

For the more accurate approximation we have 
d ' fr ' _ 4 2(j>*-Sj>?)+9(j> 2 -S#) 

P 1 / / . a n . • 



(42.46) 


(42.47) 


(42.48) 


\ 


6n(p ! -S pf) 

the case *• = 1, all j, when we are testing the independence of all p variables, 

/=ip(f-l) («•«> 

P = l-{(2p + ll)/6n}. (42.50) 

The criterion in this case is the i»th power of 1 c | divided by the product of the diagonal 
elements, namely the variances; or, equivalently, the ^nth power of the correlation 

T W Anderson (1958) gives some further details and references to particular 
cases worked out explicitly by Wilks (1935). 


Daly (1940) and Narain (1950) showed that tests of independence based on deter- 
minantal ratios are completely unbiassed. 


V 

' 


42.13 We next consider a test whether an observed dispersion matrix c canhave 
arisen from a population with a matrix proportional to a E™en mateK y. Sm V 
is known, we can transform it by a linear transformation to the mata^ Wu g 

c for the corresponding transform of the observer^matrix, we ££•£££ 

c is proportional to a 2 1 where g is unknown. * . • 

as a sphericity test (Mauchly, 1940). We now find for the criterion 

M 


/ = 


- trace c 


In 


(42.51) 


m 

■■ i. 


I 


) 
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The moments of I- « gm \^ ( „ r l,} ft r«-r» + 'i 

£(/ 2r/n ) = P rp f{ip(n-1)+P r \ 

i lr >(7 / 1 2 3 4 5 /» is distributed as # with 
and as usual -» log %p(p + l)" 1 - 

For the second approximation ^ 2p z +P + 2 

p ~~ * 6p(n-l) 

Gleser (1966) shows that the sphericity test is unbiasse 


(42 -S2) 

( 4 2.S 4) 


1 

2 

3 

4 

5 


1 

X 

mean s.d. 

1 y 

mean s.d. 

33-399 2-565 

28-216 4-318 

30-313 2-188 

33- 150 3-964 

34- 269 2-715 

68-49 10-19 

68- 02 14-49 

66-57 10-17 

76-12 11-18 

69- 92 9-88 


Correlation 


0-683 

0-876 

0-714 

0-715 

0-805 


,TW ^ nrsu or an til 

the following results: 


Sums of squares about 
means 
x 

y 


1 

2 

3 

4 

5 

Totals 


Sum of 
products 
about means 


78-948 
223-695 
57-448 
187-618 
88-456 

636-165 


1247-18 

2519-31 

1241-78 

1473-44 

1171-73 

7653-44 



Generalized 

variances 

(dispersion 

determinants) 

365-204 
910-401 
243-029 
938-451 
253-281 


log 10 of 
generalized 
variances 


2-56254 

2-95923 

2-38566 

2-92741 

2-40360 


leser (mo; snuw» --* 

U The homogeneity tests can be used to generalize to p dimensions the t«, 
of van ncel lysis of univariate theory. Whereas in the latter we are concerned £ 
compare independent estimators of variance, in the former we compare genera!^' 
variances, that is to say, dispersion determinants. 

Example 42.2 

We consider a two-dimensional case (p = 2), following Pearson and Wilks ( 1933 % 
Five samples are available, each of twelve members, of aluminium die-castings (h _ i 
n t = 12 for all t, n = 60). On each of the 60 specimens two measurements are t k ’ 
tensile strength (in 1000 lb per sq. inch) which we call x, and hardness (Rockw II 
which we call y. The data may be summarized as follows: *) 


t 


_ f 


13-28844 















































The 


tests of HYPOTHESES in multivawate analysis 
^ pooled variances and covariances about respective me 

e U a = 636-165/60 = 10-6028 ^ We have 

= 127-5573 
Cl2 “ = 28-2920 

• Cj1ca ' = 552-018 

;r iterion is then, from (42.16) in the form l-!'\ gi ven 

2 , . 1 5 ^ 
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2 , 15 

- log l = i S 

» 5 1=1 


fl°8 1 c #« H c ito 1} 


1-914,73, 


\vi 


j •", 

•ih loes taken to base 10, giving Z 2 / n = 0-8217. Forat«t™*fl j , 
rfhe number of degrees of freedom is 3(6-1), namely 12 TV oh “ ° 8 i ! = U ' 78 * 

*,ent wi* homogeneity, and we can now proceed to hypothesis 
... equal given equality of dispersions. For this, we require to apply(4217Wnd 
n d the pooled variance about pooled means. The data ll l* fJIl” ^ 


Si! 
aree 

to 


i 

t 


Source 

d.fr. 

SS (*) 

SS -(y) 

SP (xy) 

Between samples 
Within samples 

k— 1=4 
n-k = 55 

306-089 

636-165 

662-77 

7653-44 

214-86 

1697-52 

Total 

n — 1 = 59 

942-254 

8316-21 

1912-38 


1 

I 


inc pmuu — 1 ~ ^ -to t-ncii iiuu-// ana tne criterion is given by 

-60 log (552-018/1160-77) = 44-59. 

The number of degrees of freedom is 2(6-1) = 8. The result rejects H 2 at extremely 
small test sizes. 

We conclude that there is heterogeneity in the means. We now test x and y 
separately. 



Estimates of 

variance 



X 

y 

d.fr. 

Between samples 

76-522 

165-69 

4 

Within samples 

11-566 

139-15 

j 55 


An ordinary F -test shows that at the 1 per cent point the differences between tensile 
strength, but not the differences between hardness, are responsible for the heterogeneity. 




Multivariate regression 

42.15 We now suppose that our variables x are linearly related to a set of s’s 
which may be regarded as fixed, by 

x = (3 z + e . (42.55) 

pxn pxq qxn pxn 

P is a pxq matrix of coefficients, and e is a pxn matrix of errors. If its sub- 
vect °rs e 2 ,..., Cj, were independent we should, of course, have a set of p independent 




































y OF statistics 

r„, - * ^ 

ly tak "’ s °f ”,o represent means. £ w hich we shall Wr j, 

omponent fa*. » ^ se 19.1). ^ ers ion «*“* 

. *» T.it t o estimate p and 


■Sr- means, and hence shaU 

”P“" ent ft ;5 Exercise 19.«- version 

Object is to estimate p an Ration of the general 1* 

We wiiK “aJer nt notation, ftre, however, we assume nor mali 

(42.55) is, > n subsequently. 

• - of chapters 19, 

•he outset. _? the likelihood. u/k:_, 


,el of Chapxeio 

n the outset. maximizing the likelihood, 

. ■«*. rase we estimate p »y 

t2 .16 As in the univariate ca 

iven by n- . f ( A/_. . a<z,Voi(x/ P z <)i > 

L 


ie univaruM*' — 

L = (271)^ L ^ 

.he suffix (is a sample lab'b 

r the ML estimator of fa J 

3 log A * ” *- - 


which 

(42.56) 




from (42.57) 


. s l Ufii{ x >‘~ 

<=n=i L ' , 

tl s j c = S z kt X sl = X s Z/; 

£ 

= S %kt Z st = Z s Z *» 

X a is \U s !c~~ A fismQnk] ~ ® 

s \ m=l / 

vie rriVPR HR 


0. 


(42.57) 

(42.58) 

(42.59) 


us 

V 


0 \ 

is non-singular this gives 

Usk ~““ ^ rsm^mlc “ ® 

M=1 

may also write 


7 


as 


P = u v- 1 . 

pXfl 2>X(? tfXtf 

mator of a we have 

= 0, 

the cofactor of « yi m [a). Bearing in mind that a is inverse to 

*» = is(av-S ^ 

lay also write 


■ ^ Plan z m) 


(42.60) 

(42.61) 

(42.62) 
o we find 


i 


. i. . 

n = - (x -p 


N(*- 


■N’. 

thC *’ S ar ;.* Xed we find from (42.61) 

m ={%) )r .= £(x)z , v h 

= Pzz'v-i = a 


(42.63) 

(42.64) 










tests of hypotheses in multivariate ANALYSIS 

Lm tMnporarily v for the ‘Tt’ we have from (42.61) 

Pjk S u im V mU . 


'rn v ink• 


ftO& 


‘<V 

(42.55). multiplying by »„ and summing over f. then by V m and summing 


(42.66) 


2 u m Vink-SZ €jt z u v lk . 


j y 1 € jt*u v iir (42.67) 

Hence Pjk~Pjk = SS e jt *„F Jfc . (42.68) 

Remembering that and e fcw are independent unless t = u we find, with appropriate 
dummy suffixes for the summations, F 

E{$ jk -Pik)(fiim-Pir,) = E(52 e^F^SS e ^ t F Mm ) 

= n iZ SS^F m F^ 

a /< 


a jl^ a hn F^ 


ff jZ^m7c ~ ^jzFfcnr 


(42.69) 


“ ..vrv \ y 

There are pq quantities /?. The estimators fi are distributed about mean ^ with dis¬ 
persions given by (42.69). We may write equivalently 

E ($i ~ fy)' (Pz~ P z) = n,z V" 1 . (42.70) 

As the j3’s are linear functions of x they are jointly normally distributed. 

By putting p = 1 and transposing all matrices in (42.55), we return to the LS theory 
\ of Chapter 19, as the reader should verify for (42.61) and (42.70). 


42.18 We may write 

S(xj -2 PjiZi) { x k ~ 2 fl !cm ^ m ) 

l m 

= S(x,j 2 jjjiZ () (Xj. 2 fii cm z m ) + S 2 (fiji— fiji)%i(fikm~ ’AcwiVni) 

l in l 

the cross-products vanishing, 

= S(Xj — Jj IZ i) (X] c — 2 /?fc m £ m ) + 2 2 {fiji — Pjl){fikm~ PkmjVlm’ (42.71) 

Z to Z TO 

This is analogous to the univariate splitting of the sum of squares of errors into sum 
of squares of residuals (deviations from the estimated regression line) and a term due 
to the deviation of the estimated from the true parameter values—cf. (19.42), Yol. 2. 
It may be shown, by the argument used in reaching (42.69), that the last term on the 
right in (42.71) has an expectation of The first term on the right in (42.71) is 
a quadratic form in the #’s, which are multivariate normal. We do not, however, test 
one against the other, as in the univariate case, but the former against the whole. 

42.19 To do this we require a theorem to the effect that the estimated dispersion 
6 of (42.64) is distributed in Wishart’s form with n-q instead of n. From (42.64), 
(42.59) and (42.60) we find the equivalent form 

a = - (xx' — pvp'}. 
n 


(42.72) 



_ D theory of statistics 
THE advance ^ , n 27>22i VoI 2i in which 

r£Z SiT “ l s* 

s ”“ "* -' “ 0P “''' < 
in this space, themselves lymg m t V 

Now e ^ x -gz + (^-P) z 

the right are orthogonal in the K-space. For 

and the two parts on the rig ^ 

n - \v/4 /?. W t = 2 (Ujm~~ '^‘rjl^htv \Pim~rjni) 

S (*,,-2 faZ") S (frm-PW Zmt m 31 

t~l 1 l ^ 

and the first bracket vanishesi ^ the ? _ space . 0 ur original e V ect 0ts 

Thus the vectors x p lvinff in the g-space and the other orthoenn i 

tolt^I^ourn-^^e, orthogonality implies zero correlation which implies independence 
for normal variables. Thus the two parts on the right in (42.71) are independent. 

It follows that the system represented by x (3z has a Wis art distribution 0 f 
dispersions, but with n-q instead of n, the variation being orthogonal to a space of q 

dimensions. 

42.20 We may now consider the testing of a hypothesis concerning regressions 
Usually we require to know whether any of the /5’s contribute significantly to the varia¬ 
tion of or equivalently, if we “ extract ” from x the variation due to a certain sub¬ 
set of (fa)’ s, by the usual covariance technique, are the residuals significantly dependent 
on the remaining (fa)’sl 

Suppose then that we take q p’s and test the hypothesis that a subset of m<n 
zero. On the hypothesis that they are not, we estimate the q fi's and a and substitiT 
in the likelihood (42.56). Now if we multiply (42.62) by a jk and sum over i k 2 

,he exp f ent m *<= likelihood reduces to a constant. Thus the likelihood 

:ir* sr** 

W^Tqui^ dimly of r (42“”72) ^ dis P ersi ”“ 

likefihood ratio is ’ 7 ^ s under estimate. Thus the 


1 

J 


1 = 


a i* 


(42.73) 

(42.73) is distributed as the ratio of tw wr v 

sarap le numbers. Moreover determinants based on n-q and 

Id 9 ^ “rrespondinlTl J-* ; amC S ° rt ° f ar S ument was used 
contrihnf. 6 ^’ s P ace *° ‘be m /?’s which are sunn ^ a su PP°sed not to vanish are ortho¬ 
criterion of 8 ( 42 ° 73 i ^ ? n ^dependent subfet°of th Van ‘ Sh ’ and hence the functions 
*e chapter.^] fS ^ toted » the man„ er of tt? int ° Thus ,he 

owing example will illustrate th Cnterion cons idered earlier in 






r 

’ tests of hypotheses 1n MUltiv 

\ P’ e 42 - 3 ( ?f\ fr ° m M - M. Barnard, 193s . r AWATE ANa LVSIS m 

; Miss Barnard had four series 0 f Etnmt' ’ Sartlett » 19471 

• . t u to twelfth dynasties, 70 from tl> P P ? lan s kull S) 91 p r . 

a,L 


millimetres) . 

^ = maximum breadth 
fVo = basi-alveolar length 
= nasal height 
Xi = basi-bregmatic height 



Series I 
«i = 91 

Series II 
»a = 162 

Series III 1 
n 3 = 70 

Series IV 
«4 = 75 

X 4 

X 2 

*4 

133-582,418 

98-307,692 

50-835,165 

133-000,000 

134-265,432 

96-462,963 

51-148,148 

134-882,716 

134-371,429 

95-857,143 

50-100,000 

133-642,857 

135-306,664 

95-040,000 

52-093,333 

131-466,667 


The sums and squares of products within series (which have 394 degrees of fr^ 


1, ’ 

. \ 

Xl 

^2 

*3 1 

Xi 

? 

X 1 

X 2 

X 3 

*4 

9661-997,440 

445-573,301 

9073-115,207 

1130-623,900 

1239-221,990 

3938-320,351 

2148-584,219 

2255-812,722 

1271-051,662 

8741-508,829 


The similar sums for all observations together (397 d.fr.) are 


(42.75) 



Xi 

*2 

x 3 

Xi 

*1 

x 2 

x 3 

x 4 

9785-178,098 

214-197,666 

9559-460,890 

1 

1217-929,248 

1131-716,372 

4088-731,856 

2019-820,216 ^ 
2381-126,040 j 
1133-473,8?$/ 
9382-242,720 

, the sums 

of squares between classes (3 d.fr.) are 




#2 

#3 

x 4 

X 1 

x 2 

x 3 

x 4 

123 180,628 

-231-375,635 

486-345,863 

87-305,348 

-107-505,618 

100-411,505 

-128-763,994 

125-313,318 

-137-580,764 

640-733,891 


(42.76) 


(42.77) 
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4 nVANCED THEO neo us, in particular wW, 

THE A ° t r the data are ^ _ r :f s The appropriate criteri„ 

l4175> “ d 

*9.- 

6 " = 2954474 freedom 397 , is then 77-3 and ^ 
„ the number of deg** 5 = n . Thus we conclude that th 5 
-n log C f'f '\ZLwwM t test * 3 

“ mber ° f d ;tmogteous even in the ^ at this point. For example, 

jsrrirs. *-«^ 

Se Are the differences b tween ‘ „ the differences only beca Usc 

four variables, or, for example, do * * ^ this we determine the regressions of 
of their correlation with #1 an 2 , £ rom t he total variation, and test the residual 

matrices! 4 Thus we mg’arCnd *r aa a ” a *“ ° f ^ *' S « 

^r^St^'the dispersion matrix of *0 *a from (42.75) is 

«//tfVHAin UK-W 301 


394 


inverse of which is 

394 xlO- 4 


9661-977,040 445-573,301 

9073-115,207_ 

1-037,332 -0-050,942 

1-104,659. 


(42.78) 


L_ -7-_ 

variation due to regression of x on 2 is, from (42.72), 
pv(S' = (uv- 1 ) (v) (v- 1 )'^ = UV -1 !!'. 


(42.79) 




ur case x refers t0 x 3 and x i} z to x x and x 2y so we find for this expression 
l"ll30-623,900 1239-221,990 n r .- 


2148-584,210 2255-812,722 


1-037,332 
-0-050,942 


ubtracting this f rom 


the 


287-967,620 

534-238,796 


-0-050,942 

M04,659. 

1130-623,900 


.1239-221,990 
534-238,796" 


2148-584,210' 

2255-812,722. 


394-2 = 


392 d.fr. 


991-621,041 

Matrix of 

»»,, •• — 


(42.80) 


7749-887 


788 


(42.81) 




















* 


* 

tES TS OF HYPOTHESES in MULTIVARIATE an 
iiB1 ilarly, operating on (42.76) for the totals of product YS1S » 

("3809-335,190 611.698,3811 ^ ^ find the residual 



: l 


■ u +u 1 . 8 393-755,8481 (42.8 

t auestion is whether the matrices (42.81) and (42 82 

» ?'an regard the latter as the residual in the regression of Slgnific “% different. 

Renting the mean; the former has had the mean abstraa^ \ plus a vectM 
r e P re (i n 73 ) of their determinants is extracted in each class. The 

ratl ° ’ / 2 /n_ 0*277,469 

0TLM03 ~ °* 8781 ‘ 


I 


f 


,h '' “>• “”«» !>».«. «,£d” 

“ X further question considered by Miss Barnard was whether these variables might 
each have a linear regress.on on time. To investigate this we require a time varbbk 
and the intervals between the four series were taken proportionately to 2 1 2 wl 
may therefore conveniently take the values of t as -5, -1, 1, 5. On this basis 

S{t-ty= 4307-663,32 
Sx^t-i) = 781-762,86 

Sx z (t-'t) = -1407-260,75 
Sxz(t-i) = -410-101,94 
Sx^t-t) = -733-427,58 

We are now examining the regression of each of the x’s on the extraneous variable 
time. The sums of squares and products due to regression (1 degree of freedom) 


are 



Xi 

#2 

x$ 1 

*4 

#1 

119-930,358 

1 -234-810,812 

68-428,625 

i -122-377,258 

X 

x 2 

x 3 

x 4 

459-734,449 

-133-975,163 

39-042,852 

-149-601,596 

-69-824,358 

124-874,099 


(42.83) 


Here, for example, the item in row 1 and column 2 is 

Sx^t-^Sx^t-t) _ (718-762,86) (- 1407-260,75) _ _234-810,818. 

-"4307-668,72 

The residual after removing the regression on time from the original matrix is given 


* 

i 


with 396 d.fr. 


railing 

,OJ ) JLJLU1JL1 

jy * - J 

x 2 

x 3 

X x 

Xi 

x 2 

x 3 

9665-247,740 

1 449-008,478 1 

9099-726,441 

1 1149-501,013 
1265-691,535 
4049-689,004 

l 2142-197,474 
2231-524,444 
1203-298,256 
2957-368,621 

x 4 


_____ 




(42.84) 


T 







d.fr. 

Sums of products 
*1 *1*2 

Between groups 
Within groups 

2 

154 

0-544,941 0-525,765 0-509 075 
0137,786 0-069,342 0-092,792 

Total 

156 

0-682,727 0-595,107 0-601,867 

To test homogenerty we may consider the roots of 


ORY OF statistics 

„ a r,VANCE D THE ° talcing the variation wit ,. 

TH ^ ,-Hual ^ homog<» e °“ determinants is P/» = 0 . 9 % 
St whether this tpi The ratio hesis of homogeneity '■ 

- - a “ l -« i, 

! » l °8 ‘ S ld 4 e °.hnt if regression »-«™ ects . 

"' c c .°.”f are not due to temp have considered so far are rath 

* , iIv of likelihood criteria * and> algebraically speaking, ^ 

42.21 The 0 ions of one kind or ^ usrful te stmg are the val Ue 

,f determinants of di P w hich rntg e%ainple> the equality of two di s> 

•datively simple- dispersion matrice ‘ t o unity of the roots of | A-AB | ^ q 

dues of ! f ent ^ and B depends on the near ^ ^ be carr ied out on latent ro ots 

) ersionmatnce A of ratioS of type | A|/l ions are not yet well tabulated’ 

* a SenSC, asnoted in 41.27, the ^" atlons , they are not so good as 4 

SoWhwemayderiv^mple ^ desire d degree of accuracy ^ 

or the likelihood criteria, 

he methods of 42.11. rat i 0 criteria are symmetric functions of the latent 

In fact, some of our likehho ^ product of a n p roots is | A |/| B |, as is 

oots. For example, in | a l where we are more interested in individual 

:asily seen by writing A 
■oots occur in the next chapter. 

Example 42.4 (Foster and Rees, 1957. Data from Ashton, Lipton and Healy, 1957) 

Data are given for two measurements (p = 2) on three groups of males: human, 
ihimpanzee, and orang-utan. The measurements were on tooth length (%) and breadth 
x 2 ) for the permanent upper second premolar, and were transformed to logarithms to 
tabilize variance. 

sums of products were as follows: 


(42.85) 

£k": *— ; 1 The „ ulta . 

ie result ing equation is 

W'* the greater root» i" 0 ' 020 ’ 238 ’ A ‘ = °' 8 56.543. 

observed value is highly in excess 
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No. in 

“ --— 

— 


group 


x 2 

Human 

Chimpanzee 

Orang-utan 

59 

55 

43 

1-846 

1-865 

1-986 

1- 981 

2- 008 
2-119 


ti*& wanted *° te f* ' he hypothesis that A and B are equal, without knowing 
which is larger, we should have had to test the smallness of the lesser root as well. 
In the general case this presents a theoretical difficulty, since the joint distribution of 
smallest and largest roots is not known. However, as might be expected, they tend to 
t-rwndence for large p. 


smallest- « 

independence for large p. 


i p oW er of the tests 

42.22 We have already remarked on the embarrassing profusion of parameters 
appearing in a multivariate situation. Our criteria of testing in the normal null case 
do not contain them because we estimate them all; but when we wish to specify alter¬ 
natives in order to ascertain the power of a test we are in a position of some complexity. 

For a test of a mean vector based on T 2 (cf. 41.16) the power can be ascertained 
from existing tables. In fact, if the parent vector is p., the distribution of T 2 based on 
another vector p, 0 , namely 

) T2 = Kx-lO' c —1 (x p> 0 )> (42.86) 

, has a non-central F distribution with p, n-p degrees of freedom and non-centrality 
f parameter «((X-(A 0 )' Y -1 ([*—F-o)- We can then use non-central F, or one of the approxi¬ 
mations to it cf. 24.32 provided that zoe con specify y, It may also be shown 
(Simaika 1941) that T 2 is uniformly most powerful in the class of tests whose power 
depends only on the non-centrality parameter—cf. 24.36 in the univariate case. Similar 
remarks apply in the two-sample case—cf. 41,17. 


42.23 For further studies of distributions and tests in the multivariate case reference 
may be made to the books by T. W. Anderson (1958) and E. L. Lehmann (1959). 

The problems associated with the Behrens-Fisher test for the difference of two means 
when variances are not equal have given rise to considerable controversy and a good 
deal of alleged paradox in multivariate extensions (see Mauldon (1955)). We noted in 
21.15, Vol. 2, that the problem can, in fact, be solved by a method due to Scheffe which 
avoids these difficulties. This method has been generalized by Bennett (1951) to the 
multivariate case. See T. W. Anderson (1958, Section 5.6) and (1964), and Exercise 
42.12. 

For power functions see Seber (1964b), Darroch and Silvey (1963), Hogg (1961). 
Das Gupta et al. (1964) and T. W. Anderson and Das Gupta (1964a, b) obtained results 
°n the monotonicity of the power functions of a number of tests of multivariate hypotheses. 

Arnold (1964) has considered the distribution of T 2 under permutations, and Ito and 
Schull (1964) have discussed the robustness of the To test, a generalization to several 
samples by Lawley (1939) and Hotelling (1951) of the two-sample T 2 test for the equality 
°f mean-vectors. Mikhail (1965)—cf. also Ito (1962)—compares the power of T 2 , the 















i?i A 0 v 0 F | 

rfHE 0 *** tntically equivalent; i n I 

*nVA^ CBD 4 ts are »sy*»P the tests using a diff<N 

A test J 1966) co^; 6 een (42.17) and 2* r ^t 
A pother » S ch atz , i choose b tw ^ power Q f a jn 

a*V goring one or two constrain,,. «>- 
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s£^ 

con,inuous multiv< 


Si 

cri 



in 


42-24 vector 

distSS 8 - eX ERC iSES . 1 

d h >6 becomes equivalent to an i?. test( 


• n /n of 42.t> Dec--- - * 

f the critert° n w* 

vari0US domains of the parameter 

-.pqc over 1,11 j 


42J Show that tori’ 


s 1 the use 


, 22 B y considering ft >>**£ 

422 «**“*’ for the criterion 


,ization process 


show that V"""' ghoW that, tor 

4 2 3 Following E S”?Jt_l)f(f+D «'• and „ , 

appS»* . ** «* ><* 1W / ! 


-- 

(42.18) is 


l Hl , — 2/> log l is distribute^ 


42.4 


a r„ of (42 44) can be represented as the product of independent variables * 
Show that y °f (42 .“1 1 

n l n yh\ 

j=Z l.t=l J 

with parameters l(n-pj-k), iPi 
0 — 1 


Beta-variable 


j-l 

pi = 2 pa- 
a=l 


where yjic is a 
and 

42.5 Derive equation (42.41). 

42.6 Derive equations (42.52)—(42.54). 

42.7 A sample of n values is given from a single ^-variate normal population. Consider 


the hypotheses 

H: that means and dispersions are equal for each variate • 

ffi: ^ t e < S)f° nSareeq,lalre8ardleSSOfmeanS ( U - *" variances are equal and all covariances 
ff,: that given equality of dispersions, all means are equal. 

Show that the likelihood ratio criteria are given by 

Cjk ' 


P/n = 


/*/« 




•i) 


a-r.F-Ms&’fl+^-rT)-} 

--— \jCik j 


" ,he varia„ ce £ ,J /ft 




r = s 


4 c */{f(f.-i) s2)i 






TWS O, ™ OTMSK m M0LTOj , ra 

, a , n are the variance and correlation calculate * 

^ &r " , 1 £ , " lMed 'he pooled 





variables, e.g 

P jh ^ ‘ Vo - n»~JL_ y _ X9 

r'tinxv that — 2 log t, — 2 log l x and —2 lot* 7 0 

Wf+l)-2 endp-1 d.fr. resp/c,^ W"*-* as *• with 

., . (Wilks, 1946) 

« 8 USe ^ “ tIOn ° f (42 ' 25) » «*» «he conclusion „ f Esample 42 . 4 . 

42 9 Verify the results of 42.12. 

42.10 Derive T‘ as a likelihood ratio criterion in the form 

l 


(•*£)" 

derive its large-sample distribution in the null case. 


42 'H Sh0W fUrther **“ in t e gen - al Wi,h * aantple from N(l r, v) , and 

_ 1 =n ( x -V-oY c-^x-^), 

th® ^T-V, U dUtribu,ed in * e “"-central F form with p, n-p d.fr. and non-centrality 
factor «(|T-tT o y Y = t2 » say. 


pop^L^viknV;)” 1 ^. 0 ^ *■ 2 ." !) are Samp, “ - *"*• 


yj = x w — 


so that y = x (1) —jc< 2 >. 

Show that the covariance matrix of the y’s is given by 


J~ x ? ) + —-_ 2 xf - — § xf 

V M 2 V0*l*s) j=l «2 Jfc=l ’ 

■7 — 2,..., w ls & = 1, 2,..., n 2 , 


Defining w by 


u a.p — d a /9 ^Yi + ~ Y2^- 


Wl 


show that 


is distributed as T 2 with — 1 d.fr. 


(« x -l)w= 2 (yy-y)(yi-y)', 
i=i 

T 2 = y 2 w " 1 y 


(Bennett, 1951) 


42.13 Show that any test based on T 2 is invariant under a non-singular linear transformation 
of the variables with matrix say M. By considering a transformation reducing the dispersion 
matrix c to I, show that the only invariant function involving only x and c is xcx. 

42.14 Referring to Exercises 42.11 and 42.13, show that the distribution of T 2 may be 
written as a constant times 

, . 2 * {W{T*/{n-\)1r *+lTQ fi+j) 

exp (-mt ) _2 j[nhl+p) {l + T“/(n-l)>+^ 

Noting that the most powerful test using T 2 against t 2 t^ 0 is the ratio of this density to the value it 
takes when r = 0, show that the test is uniformly most powerful. 



■ 
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r pUKUi' A 

the advanced the partition of p into 

5«S'S“-“"“ -_.... 


°w, 




v _, vji ^ = d **■» ma * rix v ° £ (42 - 59) ■ 
where v u . 2 = Vl1 "^? a L . \ 

i VIZ. /_ ir.» \ 


into 


V 


where Vu.a " 11 as 

rows and columns, viz. 


C 


Vn v i2 
V 2 i v 22y 


(T. W. Anderson, 


„ , . „ the hypothesis a is true, the moments of l in the previous 
42.16 Show that when tn yp 

are given by ^ TtUn - q + 1 - ./) + <} D( K W ~~g 2 + l ~j)} 

e(/o= n fTK^n^Tnr^ + 1 + 

^ W (T. W. Anderson, 


l9 58) 


exer. 


■c lSe 


1958) 













CHAPTER 43 

canonical variables 

45.1 Apart from problems of distributional matt, 

.tfers from one serious handicap in practical application"?^ m ? tiva ™te analysis 
co®P licated inter-relationship among the v^iables anrf h f dlffiCU ty ° f disentan S Un ? 

'*« (he analysis- This leads us to attempt to reduce the „ i interpreting the results 
"hind; and to transform them to independence , on the othw^TV,™'*? 8 ' ° n ,he 0ne 
;» this Chapter are motivated by one or both of these object ^ deSCribed 

Comp onent anal y sis 

43.2 As usual, we consider a row vector x- ? = 1 ? 

dimensional random variable and n observations’ on it x ’ V - * epresentin 8 a f “ 
* in a P x » matrix x. It will often be convenient to me^m e _ ach £ abom 
of its n values, in which case the observed dispersion matrix c is given by 

C = I™' (43.1) 

We recall that if c is of rank m^p there are p - m linear relations among the .Vs This 
\ implies that there is at least one linear transformation to new variables which are only 
X vi in number—our true dimensionality, so to speak, is m, fewer than p. The result 
derives from the fact, which is not difficult to prove, that the rank of a matrix multiplied 
by its transpose is the rank of the original matrix. 


Example 43.1 

Consider the p xp matrix 

I 1 


P P 


P 

P 




(43.2) 


P P 


\ 


Add the rows and take out the common factor l + (p —l)p. Subtract p times the 
resulting unit row from each other row. We then see that the determinant of the 
matrix is ... 

(i- P )»- 1 {i+(f’-i)p}. ( +3 - 3 ) 

Except in the special case p = 1 or p — — 1 f^is cannot vanish. The rank 

of (43.2), accordingly, is p. Hence we cannot represent a set of equally correlated 

variables in fewer than p dimensions. nil i . f .r 

We may remark without proof (for which see Ledermann,: 19 ? 7 ) ”^1 n 

independent conditions on a symmetric matrix for it to be of ran m is j(f> ) (P 
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433 We may represent the ; ons> one for each variable, and rcg^ \ 

*» i, f s. 

don, and consider each var able ied in the *.d,menstonal space). j„ % 

(lying » a „Tf ,he matrix x to rank «<f> tmpl.es that the , ^ 


“ls 0 lifi« e ’a d s«b"f * dimensions. 

r ^nfinn to new variables £ given by 
43.4 Consider a transformation tonew^ 


ie f 


% 


I 


/ 


I- = ax 


r (43 .44 

* • f efficients We confine our attention to linear transforms 
where a is a matrix of coefficients. ,_ auk^u . , 7 ori hat l0r ,„ 


are much more difficult to Handle and if ^ 
°„,pec.ed to exist an attempt should be made to lmeanre the data before^ ' 
example, by a logarithmic transformation. 1 

We shall, in fact, specialize a to be orthogonal and call it 1. Specifically 

iv = n = i. ,* 

(43. S) 

We then have for the dispersions of the I’s, say V(£), 

V(|) = rcl. 

W (43.6) 

If follows, of course, that 

I™ 1 -'/'- B, 

There are p* coefficients /. Equation (43.5) imposes •|/>(^>+l) conditions 
\p{p-\) for the off-diagonal products and p for the diagonals. r ™ ° n t ^ em ’ 
\p{p-\) degrees of freedom in the transformation. Geometrically 
to a rotation in our ^)-space. ’ 

We may find one such transformation, at least, for which the £’<? o rA , 
for this imposes iP (p- 1, conditions on them, ’if the resukL 

•, s represented by the diagonal matrix 2, we have * Vanances 

Vcl = S 


^ a ? re are thus 

ft is equivalent 


/ 


4 4 


and hence, in virtue of (43.5), 
which is equivalent to 

Considering the first 


c = 121' 


cl = 12. 

7 " the **»*>» cl = 12, we have 


(43.8) 

(43.9) 

(43.10) 


n-i - a\l lm r» = 1 , 2 , 


or 


*" 4 ‘ <« - 0, l • 0l 


(43.11) 

(43.12) 


1 


similar equation is 


, , C -Ffl| = 0 . 

obeyed by the other 


values 


o: 


(43.13) 

Hence the p values of a 2 are 





canonical variables 

the latent roots of c. The corresponding variable t 187 

^ prmcipai components. lables f «e the latent vectors. We shall 

43.5 In general there are p different l at 
find it convenient to regard them as of diminiXng°mLL°V h * matrix c - We shall 
If and only if the last ? are aero will the matrix , 2^ “• ■ ■ • So*. 

the latent roots thus gives us a test of the rank of tlTT* of . rank />-«• The size of 
further, and say that if a, is small the variation is “ nearfv ^“T*' We 8° 
so on. y ln P 1 dimensions, and 

From the manner of derivation it is clear that tV, 
kind are orthogonal. But we have also transform^ «. 6 0ur ^' s P ace of the first 

Thus the corresponding vectors in our p- space of to varia ^ le s which are uncorrelated, 
gonal. Evidently the transformation is unique becauTc 'f ^ *" ^ ° rth °- 

roots except in degenerate cases. Hence our t,w c has . onl y one set of latent 
simultaneously produces orthogonality in both the /-spaces 11 ** ^ ° nly ° ne wWch 


L ** 


43.6 Consider the variance of f. 


var l 3 - = lj cl-, 


(43.14) 


® u EP ose we seek to maximize this, subject to 

= i. 

With a Lagrange multiplier X we then have to maximize unconditionally 

ljC^-Al-1; (43.15) 

which, on differentiation by l jk , fc = 1, 2,..., leads to a set of equations summarized 

by 

(c-AI)lj = 0. (43.16) 

Comparison with (43.13) shows that the values of X are again the latent roots. 
From (43.14) and (43.16) we find that the maximum value var £ } . is in fact the corre¬ 
sponding latent root. Our new variable thus has the property of possessing the 
greatest variance of any linear function of the x’s. £ 2 will have the greatest variance 
among linear functions orthogonal to (uncorrelated with) and so on. 


* 


43.7 It is instructive to consider the same problem geometrically. Consider the 
n sample points in our first type of j)-space, measured about their mean and standardized 
so as to have unit variance. Thus 


Xj a = 0 . 


n 

s 

a=l 
n 

2 Xj a = 1 . 


(43.17) 

(43.18) 
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j&cED tHE dircct ion cosines «<, 

TUB ^ X and at -*«p 

, c0 ,orain^ s " ^ - 

current co 
Xx^i 
th 


u, 


( 43 . 


joints 


„ '«) 

from this line is given by S _ 


ii i 

m 


f/.V <•;. 


s[£(*' 

,=i U* 1 . 


.*)* 

min 


’£ % 
4*1 




( 43 .; 


■ 20 ) 


imum 


The partial derivatives of (43^ 


■%) 


= 0. 


( 43 . 


v 

» line wiu. - ^ 

0 f the * P 

ofsq uarescf diStan “ 

The sum 01 q 
where c 

a==i this is a 11111X1 

I,« US evaluate .J*‘ ^"| 

with respect to e _ 5( ^ W; ) + fH> 

/== 1 ,v a ; t ;l» 

• lies on *&* ^ ““ P”"*" ^ 

H““ ,he °hfe" Sitheline«» esthr ° Ug 

t """ ^ $ (!%*)• ( 43 . 23 ) 

a * " 1 


i’s to be zero. 


21) 


(43-22) 
This i s 


what vve might 
using (43-18) we 


S-P'l 


The «’s are subject to 

minimize unconditionally 


the orthogonahzing 

n ft 

p-iL^ 

Differentiation by u k leads to 


' P \ 

2 fyet J ’ 

condition 2 = 1* We then have to 


FI 2 

-j- 2, 2 Wy» 


S x ja x H u r hi k = 0 


a=l 


or 


2 fjkUj H 


y=i 


(43.24) 

(43.25) 

(43.26) 

(43.27) 


The elimination of the u’s leads us back to 

j r—21 | = 0. 

Thus the appropriate 2 is a latent root of the correlation matrix. If we had not 
standardized by reducing the initial variation to unit variance we should have arrived 
at the latent roots of the dispersion, not the correlation, matrix. 

Moreover, from (43.23) and (43.25) we find 

S = p-l (43.28) 

It follows that the latent root which gives the minimum value to S is the largest latent 

root. Our line corresponds to f, and is such that the sum of squares of distances 
or sample points from it is a minimum. 

( 4319 ) and renlafth ^ ^ ^ P ^ nts on to a hyperplane perpendicular to the line 
? Mng a ' inein that hyP-pIane such that the sum 

O-lJ-space will be given by the sec^T? P<>Int ? ' S 3 minimum - 0ur line in the 
given by the second largest latent root of (43.27). This is not 
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)■ 


f 


edi< 


canonical variables 

-rdiatel)' obvious. However, we saw iniir,,., “9 

2 £l to the first and hence lies in the (p - l)-spacl “ * thT" d latent Vl *‘« «• ortho¬ 
fog a variance, which is equivalent to minimizing the turn of derived by “aximiz- 
'% line. S Sum of tfitiates of distances from 


*‘ ; r>T, 




0 


43.8 The following points may be briefly noted . 

„) The latent roots of a dispersion matrix are all real „ j 

( j from the fact that c is non-negative definite A f n T ne § atlve - This stems 
most textbooks on matrix theory. See, however Pr °° f WiU be found in 

Tn general the latent roots are uneaual hut ’ , 1 warnin § ln 43.36. 
particular cases. Where equality exists therTis * her ? may be e( l ual in 

size «0 pick out any one (among the ?° T “ 

Any orthogonal set will do. 4 nt roots) as having priority. 

The sum of the latent roots, from (4-3 ir\ ;<= c . 

of the dispersion matrix, namely its'trace. Uke^ tte 
roots is the determinant of the dispersion matrix. P iatent 

, H » A in'] d B fl are ,f° th non ' de p nerate dispersion matrices the latent roots of 
I A -AB | = 0 are the same as those of | B-iA-AI | = 0. In particular, if A = I 

we see that the latent roots of the inverse are the reciprocals of the latent roots of 
the matrix. 


( 2 ) 


(3) 


( 4 ) 


) v 43.9 The question of standardization requires more attention. It has been 
customary, especially in psychological work, to standardize the dispersion matrix by 
i dividing by appropriate (sample) standard deviations and hence to reduce it to the 
I r correlation matrix. In such a case the sum of the variances of the £’s is equal to the 
dimension number p. In effect, the procedure reduces all the variables to equal 
importance as measured by scale. 

i However, the latent roots and latent vectors are not invariant under changes of 

scale. In the geometrical representation of 43.7 perpendicular distances are no longer 
perpendicular. Thus, in general, we get different results according to whether a 
scale is initially imposed on the system or not. The point is illustrated in Example 
43.4 later. Whether standardization is desirable is, in the ultimate analysis, to be 
decided on non-statistical grounds. From the statistical viewpoint it is a nuisance, 
especially in sampling investigations, because it complicates the distributional theory. 


43.10 The actual solution of equation (43.13) by desk-machine is a rather tedious 
matter. For details of the iterative process involved see Kendall (1961b). The advent 
of the electronic machine has altered the arithmetical situation completely, and most 
machines are programmed to handle quite large matrices an print ou e ^PP 
latent roots and latent vectors. We shall therefore not allot space to the problem of 

computation. 
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Z«ple es 0** f iTTaied out on 123 individuals. The co ttc , 

Five psychological c „ S 

between scores on th__—1—- 



-0178 

-0-304 

0-372 

-0013 

1 - 


A principal component ana.ysis givesthelollo wing latent toots and ve c t01 , : 


( 4 3.29) 


^-X 





■ 

Latent 

roots 

1 

2 

Vectors 

3 

4 

5 

1-75714 

1-33070 

0-78086 

0-70916 

0-42214 

•55550 

—18568 

•21597 

•64078 

•44688 

•56470 

-24745 

•43969 

-•30765 

-■57611 

- -27000 
—66199 
•32041 
-■39839 
•47696 

•23572 
—55654 
- -78704 
--01481 
--12266 

- -49403 
—39538 
•19478 
•57950 
-•47523 


The matrix / is given by the five columns on the right. For example 

& = -55550*!+-56470*2- -27000*3+ -23572* 4 - -49403* 5 

and, reading downwards, 

*i = -555501] - -18568£ 2 + -21597£ 3 + -64078£ 4 + -44688£ 5 . 

The original data, of course, hardly bear five-figure accuracy in these results but' 
is convenient to retain them for checking purposes. ’ ^ 

In psychological work it is customary to express these coefficients in a j 
form. Instead of the variables £ we introduce lhec ^ 



& = hNh 

" 8 £“to ( ’ S haVe Unit Varia " Ce ' The mat " X ° f coefficients of the *’s in terms of 


(43.33) 


x’s 

1 

2 

f’s 

3 

4 

5 

1 

2 

3 

4 

5 

■73635 
•74855 
--35790 
•31247 
-■65488 

-•21419 

-•28545 

-•76364 

-•64200 

-•45609 

•19081 

•38853 

■28313 

•69548 

•17212 

•53961 

-•25908 - 

-•33549 
--01247 - 

•48801 - 

•29035 

•37432 

•30989 

•07969 

•30877 




(43.34) 
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CANONICAL VARIABLES 



for example, 

-r bUS ’ *1 = -73635^- -21419?.,+ -WOSlt + .r«,„ 

these coefficients are known as factor loadings the ?’ s ‘t' 29035C s- («.35) 

„. ra l to the situation and the coefficients a s » fi „u .f lng rc ® arded as factors 
A fferent variables. * Wei « hts which they appear in 

‘ b Ve wiU deal with questions of testing and estimation presentlv TP * 

dat lnt, in turn, for 35, 27, 16, 14 and 8 per elm t Z^ mtl0n Correlated) 
aCC twined by dividing the latent roots bv * in tV • vanance * These numbers 
are f ° oX imation, be willing to omit the last variate in which 1StanCe , 5t We migllt ’ as an 
f 2 C”nt of the variance; or even the l“ varT^f ZZ Z ^ 

S P« «*>• BUt W StiU m ~nts on all h^st^teXe S 

f’s. 


43 .U We have noted that the latent roots are uniquely determined by the dispersion 
roatnx. From (43.16), whether relating to sample or parent, it is clear that Lis also 
uniquely determined, except perhaps for a change of sign, which we can alway’s deter- 
nune by taking I,, as positive. Thus there is a one-to-one relation between the latent 
roots and latent vectors, and the dispersion matrix and the mean vectors. Since the 
sample values of the latter are the ML estimators of the corresponding parent values 
the sample values of the latent roots or vectors are ML estimators of the parent values 
in normal variation. 

The problem of bias has been considered by Dempster (1966). Exact expressions 
are complicated. 





Testing of latent roots 

43.12 An exact theory for testing latent roots is difficult to attain, for several 
reasons. Distributions are complicated, standardization procedures, as already noted, 
further complicate the issue; and we may be interested in the special cases where the 
latent vectors are indeterminate in the sense that a group of latent roots may be equal. 

Let us be clear about the kind of hypothesis which we wish to test. The first is 
whether the sort of transformation which we have been discussing is worth while at all. 
This is equivalent to asking whether the latent roots are different from one another. 
If they are not, the original x’s are just as good as the £’s for purposes of representation. 
To put it another way, are the #’s independent? 

We arrived at a test of this hypothesis in 42.12. The criterion is then the correla¬ 
tion determinant, —n times the logarithm of which is distributed approximately in the 
form with \p(p — 1) degrees of freedom. More accurately, 

_B ( 1 “nsr i ) log1 ’' 1 

is distributed as ^ 2 . 
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the advanced theobv of 


the 

292 correlation determinant i s (k 

tempi** 3 - 3 k +3-2 * e 0 vi The value of »‘ s . 123 \ Th us, approxi^S 

-123 log 0-54659 — ' . we have that 

the hypothesis. a pproximatio n 

For the more accurate Wjj+X\\ , og | r | = 72-2 

’nn is unaffected. 

2 also with 10 d.fr. c0 ^ jf we wish to test both independe 

18 This test is t6St ° f 42 ' 13 aPPli6d t0 the diSper8i0n 11 


2 ako with 10 d.fr. l ne rf we wish to test both independence 

18 This test is one of 

equality of variance we use tne p 

The criterion is then _ „ j c j log (trace ( 43 3 

lt .. + n _ 1 d f. For the more refined test we replace n, using (42 . 54)> ) 
with %p(p+ 1 J ' 2 p 2 +p + 2 ^ 

“"aJF 7 !)' (43 - 3 ’) 


43.13 This kind of test reveals whether there is any point in transform! 

canonical variables f. 


in g to 


lonical variables f. . , . , . . . . 

In one sense no test is required for non- vanishing latent roots. Any value whicli 

is greater than zero cannot have originated from a population in which the corre, 
ponding parent value was truly zero; for, if it had, the parent variation would lie “ 
a sub-space and no sample point could arise from outside that space. This will ^ ^4 
necessarily be so if the variate values are subject to errors of observation and m ^ ’ s 
ment, but this case must be deferred for consideration until we examine , re ~ 

later. actor a naly. * 


ysis 


43.14 We may, however, legitimately ask this kind of question: sunn 
certain latent roots are large and account for most of the variance* do th PP ° Se . . at 
values differ significantly among themselves, or could they have arisen f™ C remainin § 
m which the corresponding variables are effectively spherical or at 1 * * C ° mpIex 

ssr 1 - *»——,»-/i “z 3 tKSt 

.yh. 50m ™ ha[ fistic grounds to t.i 

5S?3; ^ - .SJte" " 

*>* the differences amoneT ^ Sam P Iin g er rors are small ‘eT e E Ual '° S ° me 
correspondence between the^n " ' for us t0 be able to set', g , H ’ COm P ared 

3,1 eqMl - In ,o 

'p-k-x, "J^ rminant this value ' 
P~k~- k 



roots were 


1 y—* 
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(43.38) 
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proposed is therefore the ratio of the ton ^ ♦ 

, (X 4.1 determina "te. namely 

(^/c+lAc+2 • • • ^p) 1 < - k+2+ . . . -f A^l V~k 



P~k ~J (43.39) 

the A’s are sample values. This may be regarded as the l * u\,u 
\vh efe f the arithmetic to the geometric mean of h, i ^ ^ power of *e 

rat'® 0 osa i is that the logarithm of this quantity multinl?^ u e 
l l tested in the * 2 distribution with Y> ^ by a faCt0r ™ ol ™g 


l{P~k-\){p~k+2) d.fr. 


.,| e y (1956a) has shown that if the multiplier is taken as 

„ t. i ^{2{p-ky+p-k+ 2) * 


i 


0 1 


p-k 


, Kpin2 estimated as the mean of A /c+1 , . . 
for a X 2 distribution to 0(w 3 ). If A x , . 
.° ^43 41 ) could be omitted. 


1: 

• + A 2 X __ 

J =1 (V 


-A) 2 ’ 


(43.40) 


(43.41) 


A p , the criterion has the correct moments 
, A fe are large compared to A the last term 


\ 


(43.41) 

43.15 Strictly speaking, these results apply to the dispersion matrix with units in 
the diagonals. Application to a. correlation matrix is impaired by the fact that we 
standardize using the sample variance. It appears that in this case the criterion does 
not follow a % 2 distribution. However, a rough test may be obtained, faate de mieux , 
by using the results of 43.14 as if they applied to a correlation matrix. 

In this connexion, consider again the data of Examples 43.2 and 43.3. Suppose 
we decide that the two largest roots are different enough to justify a supposition that 
they are distinct among themselves and also distinct from smaller values. 

The product of the remaining three roots is 0-23376 and their mean is 0-63739. 
The multiplier of (43.41), neglecting the last two terms, is 120, and the criterion becomes 

120 {3 log 0-63739-log 0-23376} = 12-3. 

From (43.40) the number of degrees of freedom is 5, and the observed value exceeds 
the 5 per cent point but not the 1 per cent. We suspect that the last three roots are 
genuinely unequal. 

Large-sample results for latent roots 

43.16 We can make further progress by considering asymptotic theory, namely 
standard errors and covariances when the parent latent roots are all distinct. The 
results were first obtained by Girshick (1939). 

It is indifferent, to our order of approximation, whether we write our formulae in 
terms of parent values or of sample values. We will use sample values. We then 
have the following relations: ... ... 

su.-** ( 43 - 42 ) 


S Cj a l a lc ~ h h>V 

an d, using the Kronecker delta as before, we derive from (43.43) 


(43.43) 


X Cir, Ink h m — ^ 


k i jm* 


(43.44) 
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*=" have .. . S= 


(43 


From (43.43) we then ^ S ^ ^ 

« “ irinnse the axes rotated to the £-axes, j n . 

Of generality we may n^jPP ^ terms on each side of (43.45) ^ 

and ca = 


Without loss of generality 

case % = fa ^d 4 * ' U ’ J 

find dCjj = x 

/i 3 'I = COV O'}]' > *7cA:/ 

cov (a/> h) K , 

Thus _ _ Voc s for the normal case 

oi 2 


and we 


‘cel 

W.4 6) . 




which, by use of (41.98), gives - 2Xj , 

cov (fa, fa) ~~ n jk ' 


k and 


Hence X„ 3. ate uncorrelated for J # ^ 

var 3, = —• 


(43.47, 

(43.48) 

(43.49) 


To our approximation this entaik** ^ = ^ 

a convenient form since the variance does not depend on the parameters 3. 

1.17 Once again, the results for a correlation matrix, as distinct from a dis 
n matrix, are much more complicated. We quote the results from Girshid 

i: 


43. 


persion 

(1939): 


cov 


(44) = ;{2 ««-(44-4)2£i&}. 

7* a,)3 a 

n a. P a 


var 



»' ' «./» 

where r a/9 typifies correlations 

43.18 The same method may be used to derive variances and covariances f„ ,i, 1 
coefficients of the latent vectors. For what they are worth we 0 „ n J ^ / 1 the [ 

It must be remembered that in practice we should rarely ’wish^o b "‘ P 

direction cosine. y Wlsn t0 test an individual " 

cov fe, u = ~w it y 

n(X t ~XM ’ * (43.52) 


72 32 

var hk = A- 




(43.53) 
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w 1 e C0V (4,4*) is given by (43.53) with each 7° ^'~4) 2 / 

f ^ or some recent work see T wY " ^ 

°f the distribution of W * ™ ^ Anderson (1963') u,h„ 

Parent roots are equal. r °° tS and Vect °rs and deals wTi! ^ asym P totic normality 
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^eltat^Suppose ^ further < P^aps beyond 

allo '70 or 80 per cent of the variance. It mav be a rath p mlnant ’ accountlI >g for > 

sajr, I t the remainder and to force the whole variation 61 r0Cru f ean procedure t0 
negl eC i a tent vector. But then* • latlon > so to speak, in the direction 

° f the *’s are values of business^ctlv^i^ 11 W f ”* “ d ° this; f ° r 

exanTP ’ , ■ u t loadings imnorto a y indices of one kind or another, bank 

deposits, fi eight gs, imports and so on, we may be willing to allow the first 

principal component to determine a smgle number expressing the general intensity 

of business activity. The values of f, then become a weighted index number of the 

eonstituent values of x. Whether this index remains a pure artefact, or whether it 

corresponds to some real intensity of business activity, is a matter of interpretation 

to be decided in the light of our knowledge about the economic structure of the system 

under study. 

Kendall (1961b) has shown how a fair approximation to the ordering of a set by the 
first principal component can be attained by ranking methods. Little else seems to be 
known about distribution-free methods in the field of canonical analysis. 

Example 43.4 (Craddock, 1965 with some supplementary information kindly supplied 
by him in correspondence) 

Manley (1953 and later) has constructed a remarkably long series of monthly 
temperatures for Central England from 1680 to 1963. The data are in degrees 
Fahrenheit to the nearest tenth of a degree. The year, for the purpose of the analysis, 
was taken to run from November through the following October. 

Each year was taken as a 12-dimensional quantity, one value for each month. Thus 
no scaling problem arises. The variate values were measured from the mean of the 
whole series, not the individual monthly means. This leaves in the picture the varia¬ 
tion of temperature over the year, and we shall not be surprised to find annual variation 
in a dominant position. 

There were thus 283 sets of monthly mean temperatures running from November 
1680 to October 1963. 

In the treatment earlier in this chapter we have assumed the values of each com¬ 
ponent of # to be measured from the mean of that component. If we measure from 
some other value, our product-sums divided by n are no longer covariances but second- 
order moments. The analysis remains valid, but we must expect one component, 

probably the first, to correspond to an axis from the alternative mean through the sample 
mean. 

The product-moment matrix is shown in Table 43.1, overleaf. 
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CANONICAL VARIABLES 

rfhe first ten latent roots of this matrix are as follows: 


Latent root number 

Value as percentage 
of variance 

1 

92-38 

z 

o 

2-05 

o 

A 

M 2 

c 

0-98 

D 

(L 

0-67 

0 

f-t 

0-58 

1 

o 

0-49 

8 

0-45 

9 

0-41 

10 

0-36 


Sum of first 10 99*47 


The amount of variation accounted for by the first latent vector is unusually high, 
but there is, of course, a reason the major variation is a seasonal movement. The 
coefficients of the first four latent vectors are given in Table 43.3 later. Plotted against 
the monthly means given in Table 43.2, they are seen to pursue an almost identical 
pattern of seasonal movement. 

In psychological or economic work we should hardly bother to consider the other 
latent roots. However, the interest of the present example is that sufficient know¬ 
ledge is available of the physical system which generates the data to enable some 
attempt at interpretation. Craddock, to whose paper reference may be made for 
details, identifies the second component with climatic changes in the annual mean 
temperature, and the third and fourth with patterns of variation in winter temperature. 

Table 43.2 gives the covariances, moments being measured about the monthly 
means. 

The first four latent roots of this matrix are: 


Latent root number 

Value as percentage 
of variance 

1 

27-50 

2 

15-20 

3 

10-84 

4 

10-78 

Sum of first 4 

64-32 


The picture of residual variation, after the abstraction of seasonal components, 
is now much less clear. From the coefficients of the latent vectors, which are given in 
Table 43.3, it appears that the first (whose coefficients are all positive) represents 
movement of a secular kind; the second and third indicate a harmonic movement 
over the year and, as in the analysis of Table 43.1, seem to represent a type of variation 
in winter temperature. 
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irt't 

is ^ r :lS“ceTof™e h 43 2 a »r nS t0 , ,hk aMl ^ « ™ standardize by 
- the covariances ot lable 43.2 to correlations. The corresponding figures 


It i s in 
during *» 



Latent root number 

Value as percentage 
of variance 

1 

22-54 

2 

11-51 

3 

10-29 

4 

8-93 

Sum of first 4 

53-27 


Although the differences are not so very large, they are appreciable. Even in 
♦his case, then, when all the p components of the vector are measured in the same 
units, standardization makes a difference. We should expect greater differences in 
cases where the components are measured in units of different range. 


Canonical correlations 

43.20 The transformation of a set of variables x to a canonical set £ is effectively 
the reduction of a quadratic form to a sum of squares by linear transformation. We 
noW to consider the general theory of the relations between two sets of variates 
Xi • • • > x p an< ^ • • • > x p+qy where we suppose that p^q. Following Hotelling 
(1936) we shall show that in general there can be found linear transformations to 
'( variates £i> • • • > £p> £#+i> • • • > £p+g> such that 

(a) All the £’s have unit variance and zero mean; 

(b) any £ in the ^-group is uncorrelated with the other £’s in that group; 

(c) any £ in the 5 -group is uncorrelated with the other £’s in that group; 

(d) the correlation between any £ in. the p- group and any £ in the 5 -group is zero except 
for p correlations p l5 p 2 , . . . , p p , which may be taken to be the correlations between 
£j and £p+i> £2 and £jj+ 2 > • • • > £p 2 nd 


The variates £ are then said to be in canonical form and the p’s are called canonical 
correlations. We have already discussed canonical correlation in the context of the 
analysis of categorized data in 33.44-9, Vol. 2. 

In the case of a single set of variables we were able to ensure that the £’s in turn 
accounted for as much as possible of the total variation. This is no longer possible 
here. The optimization is concerned with the reduction in the intercorrelations to 
a minimal set. 

, We will suppose that our variables x have zero means and dispersions typified by 
Those dispersions in the p -group we denote by Greek suffixes, y a p, and those 
in the 5 -group by Roman suffixes, y jk . For the covariance of a p-v ariate and a 5 -vanate 

we write one Greek and one Roman suffix: y«j. 

To simplify the notation we will omit suffixes referring to sample labels. Indeed, 
we can go further and omit other suffixes identifying £, and the corresponding 
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in the transformation. Consider now a particular pair of variables, 
8iven by , 


301 

one 


£ — ^4 a — 1 > 2, 

a 

= S m a x a9 a = P + l>p + 2, . . . * /> + 0 . 

a 


(43.54) 

(43.55) 


to have unit variances and hence 

\ tfe takethe 2 


S / a Ip — 1, 
a 

2 m a m b y ah = 1. 

a,b 


(43.56) 

(43.57) 


seek the condition that their correlation J? is stationary for variations in the 
Efficients l and «, namely that 


coe: 


i? = E l m a y aa 


(43.58) 


a, a 

r Taking- two undetermined multipliers IX and we then have to find an 
is stationary- _ iaK .&_ 


\ 


unco 


Stioned stationary value of 


0n differentiation this leads to 

l(x Ycta 


2 y<xrt 2^ ^ 4 4 y<V? 2i^ ^ y«b* 


(43.59) 


}^aa ^ ^b Yob 


S m a y aa -X'Zl( } y a p = 0. 
0 


(43.60) 


I * t.- ivinff the first equation by m n and summing, and the second by 4 and summing, 

: “ find in virtue of (43.56)-(43.58), 

] we * ’ R = A = p. («. 61 ) 

( „ . c /at /,oi are then solvable for l and m if their determinant vanishes. Writing 

A for i*, we find the (p + q ) 2 determinant 

— Ya& a, — 1, 2, . . . , p, 

= 0 n, 6 = 1,2_ ,q. (43.62) 


Y«/? 

Multiplying the first p rows by -A and dividing the last q columns by -A we find 

' A 2 Y«/J Ya& ' 


(-A) 9 " P 


= 0. 


(43.63) 


Y«/J Yat) 

If we insert another (p + qY determinant on the left of (43.63), it will still equal 
zero. We insert 


-l 




■Wab 


0 


Y ab 


Since the determinant of a product is the product of the determinants, (43.63) becomes 

A 2 Ya -YabYrtb 1 Y ap 0 


(-xy~ p 




= 0 


or 


(-A) ff " p | A 2 y«/j-Y affY^ 1 Y«/* 


= 0. 


(43.64) 
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"a simple root of («•<”! — , ro ot of multipuciry r they are det^ 
except for a ^^nts, a result which we take without p r00 f ^ 

% o“ ragebrS'forms the fs in each group are uncorrelated and ^ 

To complete, we need P f in one group is uncorrelated with H 
art from the canonical correla determine the corresponding com/ { 

”“•* 4 » d * 

>m(43 ' 6 ° ) = 


i 


(43.65) 

s m ia y a a = pi^ h Y«fi- (43.66) 

Similar equations obtain for a second pair, say f, and %. _ Bet ween these f 0Ur vari- 
s there are six correlations, of which two are p t and p r t will be enough to show 
the other four vanish. They are 


I vaiiioxi. *— j 

E(^j) = ^ ha Yap’ E ( r liVj) ~ ^ m ia m jb Y ab’ 

E(£iTjj) = 2 4t%, y a h E {£jVi) — 2 lj x tn^y ri (>. 

ultiply (43.65) by m ja and sum. In virtue of (43.68) we have 

mid = pi E (viVj)- 
kewise from (43.66) multiplied by l joL we find 


(43.67) 


(43.68) I 


(43.69) 


E (fyh) = Pi E {%i%j)- 

^changing i and j, we find from (43.69) and (43.70) 

Pi E (ViVj) = pMA), 
rchanging i and j in this, 


(43.70) 

(43.71) 


Pl E (ViVj) = PiEQtSX 

* that unless p? = p j 


(43.72) 


EiViVi) = = o, 

similar wav the nth*- „ 
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> T onhogonai conditions — 


(43.73) 


we may then 

ensurincr thnt 
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V“en in ,his case (43 ' 69 ) and («.70) show ttaTr VanUhes '“>>'** Pi = Pi - 0- 

i *-> ch00se 0Ur aSS,gnable constar it3 SO that t hr o * “,"h Vamsh ’ “* 


r& 


-~ 

43.22 When the variables are put int0 fnr , . 

,Juces to ttn the dispersion matrix 



I , 


Pi 


Pa 


0 


0 


Pv 


Pi 


Pi 


0 


0 


Pv 


0 


0 


0 


0 




1 

with a determinant equal to 




(43.74) 


(i-pf)(i-p!). . . 


(43.75) 


Example 43.5 (from Hotelling (1936), dealing with data of T. L. Kelley) 

140 seventh-grade schoolchildren were given four tests in (a) reading speed (b) read¬ 
ing power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canon¬ 
ical variates for the two reading tests and the two arithmetic tests. 

The correlations between the variates were: 



x t 

X 2 

x 3 

x x 

1-0000 

0-6328 

0-2412 

X2 

0-6328 

1-0000 

- 0-0553 

x 3 

0-2412 

- 0-0553 

1-0000 


0-0586 

0-0655 

0-4248 


* 4 


0-0586 

0-0655 

0-4248 

1-0000 


The determinant (43.63) becomes the symmetric determinant 

-2 -0-63282 0-2412 0-0586 

-2 -0-0553 0-0655 

-2 -0-42482 

-2 


(43-76) 
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THE A L 0.078,803,4^ + 0-000,362,490 = 0 , 

O' 491 ' 37 ! ; 0 - 155,635 or 0-004,740 
, „ 0-3945 or 0-0688. 

% find the .donned. ^ ~ * (43 ' 60) ' ^ ^ ^ 

o- 3945 for * we / ‘ + 0 - 6 3 28 ^ 0 0 ;S2 :: J’iS i 2 

:s:o.,4024+ <*+»•««*.=o 

' n . 148 5/i_0-16604+0- 4248 «‘ + = 0- W.?9| 

• linearly dependent on the other three and so adds „ oft . { \ 
The last equation is hnea y F of fs an d m s, finding ‘king, 1 

the other three we “ ,Ve J"* = -2-7772 : 2-2655 : -2-4404 : 1. 

Thus the transformed vanates^are 

kjl = -2-4404a; 3 +* 4 , (43.81) 

where /; and k, may be chosen so that the variances of f, and )), are unity if 
Similar equations with the root 0-0688 will give us a further p air of c ^Sed, 
ordinates Those we have worked out have the maximum correlation, the ntl!^ C °' 
having the minimum and therefore being of less interest. ot her pair 

43.23 Standard errors may be obtained in the manner of 43 16 g. 

S4W=i • * 


we differentiate to find 


^WaWbCab = 1 

^ 4 m a C aa = Ty 


2 2 


lrtin g from 

(43.83^ !> 

(43.84) If 

(43.85) 


1 c<xp 4 dip +2 4 4 dc a p — o 

2 ^ c ab m a dm b + 2 m a m b dc ab = 0 

d r = ^ 4 m a dc aa +2 / a c aa dm +12 m c dl 

Without loss of generality we may now suppose the variables n * VTJl08 ) 

AH /s and ms except 4 and m x vanish and we have P mt ° Canonical form. 


(43.86) 

(43.87) 

(43.88) 


Similar equations 

Multiplying 


(43.92) and 


dc i. P . 

y o 

dc i, 29+2~ \r 2 

(43.93), taking 


(43.89) 

(43.90) 

(43.91) 


— „ me vat 

vanish and we have 

2<f4“f ~dc n = 0 
df _ 2dm > + dc ,«.vn = 0 

Substituting from the first ^. 

dr l = i c , 0 * ese cc l u; itions, we find 

, 1 l 'P+i-¥i(dc u + dc \ 

PP y *° an y other simple root ' 

dr 9 = rj r , ’ ^ 0r exa niple 

J! c r +dc ^- < 43 - 93 ) 

cov (r„ r.) r.r an<J US ' nS (41 ' 98) ^ find 





likewise 
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. t u similar formulae for the other correlations Tt ;« 43 ’ 9 ^ 

l** ‘^- sample f ° rmUla f0r an ordinar y product“fL“ S „ iS ^ 

Hotelling (1936), to whom this derivation is <\ u 
zero root accordingly has multiplicity t then ; ^ e ’ S ™ ed , that if P = 2, q>2 and a 
a canonical correlation vanishes and p = a (41 qr? 1 f r !j Uted . as ^ l ~ x d.fr. If 
sample values near the zero root must bealWrtl Wlth . the qualification that 

or alternatively that the distribution of r is that *u° L aV ? posit * ve or negative values, 
Lawley (1959) derives expressions for the third * “ b * olut f value of a normal variate, 
considers the variance-stabilizing transformation inv^' f ° Urth co J nulants of r - He also 

- •» -** *■ &V"* are 


43.24 

r 

i or 


It follows from (43.64) that the p* are the roots of the determinant 

1 I = 0 


equation 

(43.96) 


1 P 2 I—Y Y 12 Y 22 1 Y 21 1 = 0, (43.97) 

where Yu> the matrix of the p variables to * p , y 22 that of ^ +1 , . . . , y 12 i s 
the covariance matrix between the p variables and the q variables, and similarly for 
y 21 . Thus the p 2 are the latent roots of the matrix product in (43.97). 


>C 43.25 The results of canonical correlation analysis are even more difficult to 
'' interpret than those of component analysis. It is best regarded as an exploratory 
5 tool which will give us some idea of the structure of the multivariate complex under 
study, and in any case tells us what can be the maximum amount of correlation between 
| linear functions of the two groups of variables. The literature of the subject has few 
j. examples of useful practical application; cf. Barnett and Lewis (1963) for one in educa¬ 
tional research. For this reason we will pass rather quickly without proof over some 
- remaining theoretical points. 


(a) For simplicity of exposition we have supposed q^p. If q<p we simply reverse the 
roles of the two groups. 

' (b) If we insert ML sample values for the matrices in (43.97) we obtain ML estimates 

I of the canonical correlations. 

(c) Looking at the matrices entering into (43.64), we see that one is the dispersion 
of the p -group and the other (product of three) can be regarded as the contribution 
from regression of the p -group on the fixed ^-group. Thus the theory of regression 
(42.15-20) applies here. The distribution of the latent roots p 2 in (43.97) is that 
of the A’s in 41.22-3, provided that the p -group and the g-group are independent, 

i which unfortunately is the case of least interest. 

(d) Bartlett (1947) proposed a test, analogous to that of 43.14, based on the expression 
of the correlation determinant as the product of p factors 1-pj. If k canonical 

f correlations have been accepted as non-zero, the criterion for testing that t e 



Ttf£ aE> v 

zer „is _ 1) + S 0-=} 

oth ers^ + » A) degrees of freedom. 

<■ * **1i (i)—'fcjW . c-~4-nr\T rpftnlts 


(V ANC^ 


tH EORV of statistics 


u “ „{»->■"* .„ k )la-k) d« recs 

. „ wi* satisfactory results. 

, vl, is appro^^Lt with reasons y ^ app ii ca tion can be 
wh,C vl*ated this est towa rds pr others are . Sce , 


Lawl, 


(43i 


ley (l^ 


apP^Twest with ***»- ' ctic al application can be made, na 
tigated some prog resS t0 ^ t Z ero but the others are. See Bartlett (A 

(f) Dempster t Q ue „ 0 uille s m 

1 correlations by U 


/f) Lfciu^ : - ' Quenonui- 
1 correlations by U 

. far discussed in this chapter are design 

Factor analy . odg we have so cture it may have. Those we now f Xam 

43,26 system to see what sort of ^ end We begln with some ® 
examine a sy speak, from , ; t p ts the data and, if m, 

tackle the problem so t P fe ^ see whethe r 


to 


a(pxl)^vector^x^We^pp^ ch afe known as factors, there being m<p 0 f ^ 
Thus we have m 


Xi 


in 

2 ljkCk+ E j- 

7- 1 


- »,!- ^ <«•*) 
The coefficients l are not now the constants of a rotation to new axes As in component 
^ y tlhey are referred to as factor loadings (a term surviving from eariy psycho . 
Ctl usage for what are more familiar to the statistician as weights ”). The ft atc 
assumed to be independent normal variables with zero mean and unit variance. Since 
our x-compleXt in general, is not representable in fewer than p dimensions, an exact 
representation of x’s in terms of £’ s requires an erior term s. As part of the model 
we suppose that e } is independent of s 1: and of all the £’s. Our problem is to estimate 
the constants / and the variances o) of the e’s. 

This is not a regression model. Our £’s are random variables which we do 
regard as fixed quantities like regressors. The relationship is structural in the si 
of Chapter 29, Vol. 2. 


not 


43.27 The first thing to notice about the model is that it is undetermined. We 

f43 99?Ihem am 1 ^'^T 8 ' 011 ; 21 C ° mpleX ‘ ermS ° f m+ P random variables - In 

condition that the £’s are MO n l a ^ 0 utlons - We have already imposed the 

The question is wheth' W sha11 a,s0 squire the e’s to have zero me®, 

and between ^’s and e’s the n ^ l fJ ctlon w hh the conditions of independence among 
° 2 * s determinate. ’ f ° Cm es hmating pm constants / and p constants 
Since the t’s and 

cov ? "°, rmal ’ the *’ s are also normal. We have then 

1 Jirt.t.wj 


! 
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m 


^ hthu 

i —i 


j * k, 


m 


var x } = 2 11+a). 


t=i 


may summarize these relations in 

Y = U '+2 


(43.100) 

(43.101) 

(43.102) 


:r 


wher e 1 ls the ^)Xw matrix of coefficients l jk and 2 is the pxp diagonal matrix of o). 

The number of dispersions on the left in (43.102) is \p{p + 1 ). The number of 

constants °n t e rig is p(m+ ). Thus if m+1 > ^(p+\) there are not enough 

relations in ( • ) to etermine the constants. We shall, in fact, normalize the 

constants / by requiring that 

V 

J/sW = o, j # /e, (43.103) 

or equivalently that 

1Sl l = J> (43.104) 

an wi X diagonal matrix. This imposes a further \m{m— 1) conditions on the con¬ 
stants under estimate. The equations will be indeterminate if 

\P(P +1) <p{m +1) — \m{rn- 1 ) 

which reduces to 

(. p-mY<p + m . (43.105) 

We shall therefore assume that the contrary is true. 

Example 43.6 

The inequality of (43.105) reversed is equivalent to 

{{p +1) - m \ 2 > \{$p + 1 ). (43.106) 

For example, with p = 5 our model is indeterminate if m is greater than 2. We 
should not set up a model of a 5-dimensional complex with more than two factors. 
For p = 10 the largest admissible value of m is 5. 


8 43.28 The reason for imposing the orthogonality conditions (43.103) is as follows. 

Consider a non-singular orthogonal transformation of the £*s to new variables rj given by 

S = My). 

it The variables rj will also be N(0 , 1) and independent, and in place of (43.102) we should 
It have 

i Y = 1M(1M)' +2 = 1MM' V +2 

* = 11 '+ 2 . 

1 In short, our £’s are indeterminate within an orthogonal transformation. Equation 
‘i (43.104) resolves the indeterminacy in a convenient way, but there are other methods 
f of doing so. 


43.29 If, as we henceforth suppose, (p-m) 2 >p + m we have, in equations (43.102) 
and (43.103), more equations than constants. We cannot therefore solve them as 




rcD theory OF stat.st.cs 

the ADVANO lotion procedure. FoU owi 

* hut require son* £ ^ roeth od of maxrmum likeli&X, 

(U 'oil peculiar *Jj“££ tion , the ^ ^ constraints of the estJ matio S 

estimation in Tius * ^ cou ld take the observed c s as esti^X 

of , h e parent eoV and ( +3 .l° 3 ). ^ N 

s ; s nt wt should have simply C= lt'+S 


/* 


°n> 


, „ e second equation is true but the first is only true of the 

4 we Shall see short y, uMihood function 

iagonal elements. ;thro of the l * ellh0 

We start from the g ^ constant _| B log | y I S r * c *> (43 , 

Substitution from (43.102) for y gives us a fun«i„ n J 
here r is inverse to r. £ whe re there rs no amb, gutty we omit cir CUtn ™ 

e maximize for variauui « 

:cents for ease of P**™* 2 gives us, after some algebra, 

Differentiation with respect ts 

V tt - S i (j c jk L kt 
j> k 

hich is, summarized for all t, equivalent to 

diag (y-r-Y ‘ C Y ) - M - 

rpcnpr f +0 h gives, after some reduction, 

Differentiation with respect 10 i ]k give , , 

£ l [k r (j — £ ltk ^ tu ^uv ® 

t t, «>» 

iich is the element in the 7 th row and kth column of 

Fy-i-Ty-icy- 1 = 0. 

> be consistent we must have, as well as (43.108) and (43.110), the equations 
d (43.104) applying to ML estimators, 

y = U' + 2 


(43.108) 


(43.109) 




(43.110) 


(43.102) 


j = r 2- 1 1. 

.110), postmultiplying by y, we have 

V 


(43.111) 


(43.112) 


r-r y _i c = 0 


(43.113) 


he 1 = L y -1 . 

(43.109) by y— 11', which is equal to 2. We find 
diag (I-cy-i-U' v -i + lr r „ = 
,43.1131. reduces te 


(43.114) 


to 


— cy — J 

irtue of (43.113), reduces „ 

diag (I-cy-i) = n 
K on the right by v- lr w ‘ , . 

3 ' 1 ' We find similarly 

d “g (y-cl = n 


(43.115) 
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is equivalent *° die equations 


m 


°j = c»- X If,. 

7j=-1 


f fj 0 w fr° m (+3.112), postmultiplying by 1', we g nd 
i Ji' = rs-iu = i' S -w v _ 


= 1 ' S-iy-l' 


(Y-S) 


Thus 

JI'y -1 = l's-i -r Y -i 

which, in virtue of (43.114), reduces to 

Jl'c 1 = V 2-i-l'c-i 
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(43.117) 


(43.118) 


giving 


Ji' = i' (Z-'c-tt-i) 
= l's-^c-t). 


(43.119) 


43.30 The equations are still troublesome to solve. Recalling that J is diagonal, 
we see from (43.119) that its elements are the latent roots of Z- 1 (c-S). One iterative 
procedure is to guess some values of S, determine from (43.119) the latent vectors 1, 
substitute in (43.117) to improve the estimate of the iterate with these improved 
estimates in (43.119), and so on. We can estimate y from (43.111). 

The process may, however, converge very slowly (cf. Howe, 1955) and it appears 
that on occasion the estimates of some of the cr 2 tend to zero. It cannot be said that 
this subject has been mastered. 




43.31 When satisfactory estimates have been obtained, the usual type of likelihood 
ratio can be used to test whether the number m of factors which have been chosen is 
satisfactory. Under the hypothesis that there are in fact m factors, the log likelihood 
is proportional to 

- \n log | y 1 - \n tr (cF) (43.120) 

On the hypothesis that the x’s are normally and independently distributed with no 
errors e, the sample dispersions are estimators of the parent values and the log likeli¬ 
hood is proportional to 

-\n log | c | -\n tr (cc _1 ) 

= — In log | c | - hip. 


Thus the ratio 

-wjlog i|i-tr(cr)+i>j (43.121) 

is distributed approximately as x 2 . The number of degrees of freedom is the number 
of constants fitted in the second case less those in the first, w tc is 

\ {{p-m) 2 -(p-\- m)} , (43.122) 


as we noted in (43.105). 

Bartlett (1951) suggested that a better approximation 
instead of n in (43.121), the multiplier 

n! — it — K 2 7> +11) — 


would be obtained by using, 

(43.123) 
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Uf) 




, is given by < 43 ' 123) ' ic computer, psychologists were co mp ,,. 

” • -t of the electr for obtaInmg solutions to pj, "• 

S P l V r^.y be, said * be moretftan me^ 


Before 


the advent 


where 

byafii'd^ "Some'of S ^ d “le with any degree of theorefeaT^ 

of factor and^' gh difficult “ oxiroatio ns from whtch the iterative ^ 
desperation; oth«, ylding «ts PP for detalls and numenca , il lust >' 

-SSrf&S^* (196Ib) and LawIey and Maxwen (>5 

,ponent analysis, we emerge with exp ress j 0 „ 


tions 

may 


be made to Harman 

. • . s in component -=- cxpressi, 

,,33 In factor analy®. as 1 of so me unobserved-usually unobservable 
giving the variables * » » a rule, is to know what the results mean. Psyc ^ 

variables f. ,^“‘"3 f. with some factors which they believe to u„ derli( 
legists usually try to rdentny ioJl of sam e technique to physical system 

the structure of thesys“ '' d a PP s 0 f variables to whtch no clear interpretation can 
very often results m other reasons , we may return to the model to see 

^“JrfcatJcanbemadeini.. 

43 34 We recall first of all that, to arrive at a unique solution, we imposed an 
orthogonality condition (43.103) on the factor weights. There is nothing in the 
model to require this, and, having found the /’s, we are at liberty to transform the £’ s 
how we like in the wz-dimensional space of 4 s. We can, in short, Yotcite the factors 
We can even transform them to non-independent factors. We have, so to speak 
estimated the factor space but are not committed to any particular co-ordinate system 
within it. There are infinitely many choices, and which we take depends on non- 
statistical considerations in any particular case. Two criteria suggest themselves- 




*’ £=, v sri““sr7 s F i ™' ■ ■ * 

at the expense of others- but it ^ va lue of some / s can only be carried out 
a variable ’ may aIso lea d to the identification of a factor with 
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t, the factor ti> #3 to x i the factors f. and £„ anr i cn m,. 

ioadi - •*- * - - ** - ws 
•V^rrr r?F t ’ 

Ives correlated. We shall not enter into a discussion of these tnnirc „ u- u 
"Jhave scarcely reached a stage of development in which a critical review P of theoreti 

fe ° n ? ag ! in the f“ computer has come to the a °d o'f 

^Xlogists by enabling them to specify sundry criteria to determine rotations o 
Ifiral simplrfication and to solve the resulting equations, but even the computer 
2 find it hard to provide accurate information about the sampling distributions of 

^resulting estimators. 

43.36 A word of warning may be desirable against attempts at component or 
factor analysis of matrices which are not obtained by product-moment methods. For 
instance, the elements.of a correlation matrix may be estimated by tetrachoric or biserial 
coefficients—cf. 26.27-33, Vol. 2. If they are, the matrix is not necessarily positive 
definite, and in certain cases some of the latent roots may turn out to be negative. 


EXERCISES 

43.1 A ^-variate complex has the following correlation matrix: 


pJ>-2 p P -3 


Show that the determinant of the matrix is (1 -/>*)’-! and hence that the complex cannot be 
represented in fewer than p dimensions. 

43.2 Show that if p>0 the complex of Example (not Exercise) 43.1 has one greatest 
latent root and that all the others are equal. Verify that the sum of the latent roots is p. 

43.3 The correlation between variables j and k in a 0-variate complex is \-\j-k\/b 
Show that the complex cannot be represented in fewer dimensions. For the case p = 4 show 
that the latent roots are (2±V2)/4 and (6±V26)/4. 

.Jl A cu h °\ tha V f the latent r00tS of a dis P ersion matrix A are typified by k, those of A 2 
SQU J' ( ™ , for Iar S e k the matrix A* tends to have diagonals which are Aj times the 
quaies ot the values of the latent vector A x being the largest latent root of A. 

43,5 In the notation of 43.20, if 

A — | | B = | y a jj | 

c= 0 Vm I T> _ IVajJ Yaa\ 


0 

Vena 

11 

p 

Ycua 

1 

Yaa 

Yah 

\Yaa 

Yab i 


X 
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advanced th»o«» O' >™<» 





show 


^ coefficient Z defined by 
of the vector a ten Dl(AB) 


and the square Z 

, otions of the variables. Show also that 

. i; ne ar transformations 

are invariant under ^ n p .- 

/-1 


2 = s d-P5)- 

j=l 


where the p’s are cam 


l0 nical correlations. 


(Hotellin, 


e, 


^36) 


of the previous exercise / and , being the sampie values ot * 


43.6 In the notation or tn p corre lations are all distinct, 
show that if the popu ation c ^ p ^ —p]) 2 

V ar k = - K 2 2 - 


and; 


M 


i=l Pi 


In particular, when p = 2, 


4 ? 

var z = - Z 2 1 j p"j 
n j = i 

cov (£,*) = (1-Pj). 

^ i=i 

var k = - {(1 -K 2 ) 2 - Z{ 1 + /C 2 )}, 

n 

4Z 2 

var Z =-(l-Z+K 2 ) 


cov (*, *) = - - KZ (1 + Z-K 2 ). 


(Hotelling, 1936) 


43.7 In the previous exercise, with p = , = 2, show that, in standard measure, 

h _ ’•nfj.-rii fn 

.. . . W-rSsKl-t!,)}! 

hence denve a tea, of the hypothesis that the « tetrad difference - is , 

(Hotelling, 1931 

«.8 In the notation of Exercise 43.6, show that 

4 ])}V{ ^ n -9-j)}V{^ n +a+2P-j)y 

(Girshick, 193! 

,h ” * "***-£™ol° f 0 ? Z" “s^'^rr 11 ma,rix f e< l ua I> say, to unis, - 

s as mean unity and variance (p+\)l n • 

(Girshick, 193 1 










Chow that the distribution of the tetrad difference of Exerd,. 4, , 
S , h ° uncorrelated parents, is given by zeroise 43 . 7 , 


2 ) r 2 _(^n) f 1 f 1 {tv-u) n -*dtdv 
[£(n-1)} J „ J u/t {(1 -* 2 )(i _ u2 yjj du. 

jn the notation of 43.29 show that 

4 3 : 11 1 H = l'S-i(c-2)2-il 


denoted by «, 


(Girshick, 1939) 


iial to 


j 2 a nd hence is diagonal. 


1 /j 

lS (Lawley and Maxwell, 1963) 

,, 4 2 If the “ error ” variances in a factor analysis are at choice show that th. , 
reduce the number of factors ... :c * ^ at t3le y can be 


chosen 


so as 


h\ 


to reduce the number of factors required to in if 

P>¥p-m)(p-m + t). 


n 13 Consider a factor analysis with p = 2, m = 1. Write down th* lil^un j t 
d show by differentiation that the ML equations are llkehh °° d 


~ c izh+ [ Ch-~~)L = 0 


h 


On 


C22 \ - 7/ ^ c i2 4 = 0 . 


C 22 C\l 1% 




Hence that 
and thus that 

a %fk = 

This is an inadmissible result for the free estimation of the four parameters. Explain the 
for its appearance. 

43.14 Verify the value for the determinant (43.74) given at (43.75). 


reason 




CHAPTER 44 

and classification 

discrimination aw 

, „ be concerned with problems of different . 
M , I„ this chapter we shall » ^ of multivariate measurements. > 

between two or more ^ are often confused: V 

are three distinct classes r ^ence of two populations and a saiM 

(a) Discrimination. We “'^" problem is to set up a rule, based on measure^' f 
( ' individuals from each The P enable us to allot some new mdnidual ^ 

from these individuals, *c ^ know from hlch 0 f the two it ema» at J* 
correct population when we of ; ndiv iduals, or the whole population 

(b) Classification. We ^ groups which shall be as distinct as po^ 
XimWon the existence of the groups is given; m classification it is a ^ 

to be determined. population and wish to divide it into 

« sjs. sari'3,-' -—- 

For example given a set of individuals from two different races, we may wish to w 
up a function which will enable us to allocate any freshly observed individual to fe 
correct race. This is a problem of discrimination. Or, given a population of Unknown 
origins, we may wish to see whether they fall into natural classes, natural in this sense 
meaning that the members in a group are close together in resemblance, but that the 
members of one group differ considerably from those of another. This is a problem 
of classification. Finally, given a set of students with observed performances at an 
examination, we may wish to divide their standard of success into firsts, seconds and 
thirds, and the points where we effect this division are entirely arbitrary. This is 
problem of dissection, and presents itself even where the population is homogen 3 
In this chapter we shall discuss discrimination and classification, but not dissection 

Discrimination 

“; 2 .| Bef0,e ginning the theory of the subject, it is worth considering whether 
the problem as we have described it makes practical Wo • S 

individuals each of which is known with cer aintv o hi r , * 3 Set of 

tion B. If we can acquire this knowledge ‘with erSyforTh " 

for any new individuals which we may meet? Tg. 7 , ° h /* gr0Up ’ why n ° l 

providing an answer to this question^ ^ at IeaSt three l yP es of case 

(a) Lost information. We mav r 

of human bones dug un on an Jr- 6 u° ^ a ^ e t0 ass *& n to the correct sex a number 
t eie would have been no nrohLrn ^ s * te - While the beings were alive 
into dust. Pr ° blem > essential information has crumbled 

(b) Unattainable information. A samnle f, • 

314 0S ^ )1 ^ recor ds may provide us with data 




U 


a 


y 


f 
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eternal symptoms and the existence of intern i j- 315 

. f nifl8 e t u e disease from external svmntr,™ ., disease. n,, 

, lo *fftbc ob j ects ™y be *» diagnose and""* ^au^T'^tan” 
V “^inatio" 13 avoided. at a„ early stag ^ nd 

! V! eX It may have been found from past experience ,h . 

jr^certain types of behaviour, for example of economic ““ dis «™»ate 

(‘> pi< va ,ions made at a previous point of time. We rely JoU ■ °" ** Wu 

»f» bse ‘Idnt of time in order to predict the behaviour [„ the^T^ 0 " 5 atthe 


.f» b ^ point of time 

preset r - — 1C . 

v . imnortant to note that we shall, in the fW „„„ 

^ dividual to one of two classes without provision for s^etled! f 
„fa« „ sav, h is mandatory to assign to one of tire classes, even at A ^ d « ement - 
fltft 15 Jke the assignment we may commit two kinds of mistake !e ^ ° £ crror - 
«*»?. ,o which we wrongly allocate a given member, and we shm d “ g ‘° the 
’ W ! distance that the two types are equally important. assume m 

^ c 5na ce W of t> dimensions in whirl, a 




ff, instance that the two types are equally important. me m 

then a space W of p dimensions in which a samnle »,. m u ; 

C podit whose coordinates are the values. The two pOpSsly 
bf ;,C clusters of points (or continuous densities) which are separate (for otherw £ 
Csld be indistinguishable by means of *-valu ? alone) but to some extent oZ 
fnoiog ( for Otherwise there would be no problem of discrimination). We wish to set 
! »boundary in the space such that as many as possible of population 1 lie „ n one 
land as many as possible of population 2 on the other. And we require the boundary 
„ have a fairly simple shape. If/, and /„ represent the respective frequency functions 
1 require our boundary to determine a region R such that 


\jjx = ( 


W-R 


fidx 


= !“ f fxdx. 

Jr 


(44.1) 


This is equivalent to 


f (A+/0&-1. 

J R 


(44.2) 


This condition means, in effect, that the probabilities of misallocation are the same for 
the two kinds of error. We further wish to minimize the total error, which is equivalent 
to minimizing one of the types of error, say 

f f 2 dx = minimum. (44.3) 

J R 

Ihe problem is then to find an unconditional minimum of 

(44.4) 


{/2 ~Kfi + /a)} 


R 


or . equivalently, of 

J (P/.-/0*. (4+,5) 

be constants 2 or (1 being determined from (44.2). This is clearly achieved by 
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se points, and only those points, for which />/.-/,<«. ^ 



into R all those. 

R is therefore given by yj// a = 

. _ „ rat i 0 of likelihoods, 
that is to say, by a iati 




S 

( 44 , 


Of 


44-4 This is i g* „ 

rrSdilt^hut a nearness in probability. The probabi,i ty of ^i* J 


0 


a metrical uiaiauw ~ ^ r 

for either type of error is gi Y 


f,dx 


fJh>P 


= f fidx. 


lassi «%J 

( 44 . 7 ) 


V 


We have, in fact, re-proved the Neyman Pearson lemma of 22.10, Vol. 2. 

44.5 Now suppose that the two populations are multivariate normal 

( x 1 and p 2 and identical dispersion matrices y. Apart from constants, the 1 
of the ratio of likelihoods is then, with T inveise to y, ^EaritJ)^ 

-i £ V jk {{ Xj - {x h -nu)-{Xj-fty){x k -/%)} 

The second part of this expression is a constant, and without loss of p- V w 

take our boundary to be determined by 8 er aiity We 

2 Tfkiihj-ptjtot = constant. 

( 44 . 9 ) 
Pooled di s . 

S Cj k (xjj — x 2 i)x k = constant. 

44.6 The same result may be reached by a different mm* c ^ 

etermme a linear function nt loute - Suppose that we 


j ) " 

This is a parental form. If we are given a sample with means x ~ 
persions c jk) the sample boundary function is h and 


J 


V 


X = LL 


-j Xj 


° “* mffiimize the rati ° of between-class io within-clasa v • <44 ' n) 

vanances ' “■** 

l differentiation wi«h respect to l } gives Vs ^ 


( 44 . 12 ) 


)m which 


We have 


fhi - // 2 . = f ^/c4 


^VkYjk 


din £ back to (44 o\ Q . ^ * ? ** ~A* v ), 

ns * to measum th P ^ nCe ° Ur fui *tion X ' 
istant. Ure the distance betw^ / 




T> 


( 44 . 13 ) '■ 


1 Junction X is 

between them ? SC ° n ^ *° se parate the two popula- 
m ’ we may multinlv ,7 k„ ™ni*nt 











P&r: 






DISCRIMINATION AND CLASSIFICATION 

44-7 jfcuta to'fhe fee joining m th7^W'^es'hyperplane (44.9) 
isP'lfsee this, make an orthogonal transformation of th^LT“t wUch 
* impendent and have unit vanance. Since fl and /. have the same lispSon 
Itrix. same transforma ion reduces each to the required form. The discriminating 
n» ai * nlane becomes 6 

^ (pxj-fafjXj = constant, 


. 1 • ' . y\ >f f'.i 

*.' ■i -, ’ - * .V»^ - 4 :• r ,C - 
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' 

-.ym- 

i’y- i .>-m 
mM 

. 


!> lane 


w 


V 


.jiich is perp endlcular t® the line joining the centres, whose direction cosines are 
jro portional to P-i P- 2 * e istri utions / 1 and / 2 are at the same time transformed 
Jo spherically symmetric functions, and without loss of generality we may take one 
co-ordinate axis along the line of means. The integrals giving the errors of classification 
then reduce to univariate normal integrals and are clearly equal when the discriminating 
dary bisects the line of means. 

b ° This determines the constant in (44.10). If X x is the mean of the left-hand side 
w ith respect to f 1} and X 2 js that with respect to / 2 , the constant is halfway between them, 
ie. is_equal to %(X x +X 2 ). Without losing generality, we henceforth assume that 


^ Example 44.1 (Fisher, 1936) 

Table 44.1 gives the measurements in centimetres of four variables on 50 flowers 
, from each of three varieties of Iris, namely setosa (S), versicolor {Ve) and virginica (Ft). 
i« | Consider the discrimination of S from Ve. The variables are: 

x x = sepal length 
x 2 = sepal width 
x<i = petal length 
x 4 = petal width. 

The means were (in centimetres): 


'i 


Variate 

Versicolor 

Setosa 

Difference 

X 1 

5-936 

5-006 

0-930 

x 2 

2-770 

3-428 

-0-658 

x 3 

4-260 

1-462 

2-798 

x 4 

1-326 

0-246 

1-080 


(44.14) 


The pooled sums of squares and products about the means were (in cm 2 ): 



X 1 

x 2 

x 3 

4 X 4 

*1 

19-1434 

9-0356 

9-7634 

3-2394 

x 2 


11-8658 

4-6232 

2-4746 

x 2 



12-2978 

3-8794 

*4 




2-4604 


(44.15) 
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The inverse matrix is, in cm-': 


x x 

X 2 

*3 

0-118,7161 

r J r r 

-0-066,8666 

0-145,2736 

-0-081,6158 
0-033, 4 101 
0-219,361 4 


0 039,6350 
-0-110,7529 
-0-272,0206 
0-894,5506 


(44.16) 


UUC&LIUIIO v/x “Vg.vw vjj. 11CCUUII1 sometimes 9rie*> ir, . r , . 

the object of the discriminant function is to senarate 't u 1 SS wor ^’ ^ ut since 
tne oujc uii . i , separate, it can absorb an arbitrary constant 

in the coefficients. We shall ttake the total sample number 100 as the number of d fr 

in (44.15). The values m (44.16) are then to be multiplied by 100 to get the inverse 
of the dispersion matrix. r y ° gex me mverse 

Using (44.10) we then find for the coefficients 

4 I (0 ' 930)_(66 ' 8666) <-°- 65 8)-(81-6158)(2-798) + (39-6350)(1 -080) 

4 = -18-390,75 
4 = 22-210,44 

4 = 31-473,74. - 

We may multiply these coefficients by any convenient constant. Taking the co¬ 
efficient 4 to be unity would, for example, give us 8 

Z = ^i + 5-9037^-7-1299x 3 -10-1036^. ( 44 18 \ 

™ ean V / lue ofZfor versicolor, obtained by substituting means in (44.17) is 

f ‘ ? \ Th ^° V * et0Sa 13 T 38424 - Th e mid-point is 14-247. Thus for any value 
of X above 14-247 we assign to versicolor ; in the converse case to setosa. 7 

h J4.8 We may calculate approximately the probability of misclassification. We 

var X =1,1 j l k y. k 
— 2 lj y jk T km 

i., . ^ h (/^y f^2j) 

which is estimated using ( 44 . 11 ) by 

var X = Xi — X%. ^44 

diff ™ S iS estimated variance of a single value of X. The variance of half the 
tale nce . tW0 mean values > each based on n observations, is (X,-X,)/2n If we 
t U U , r he Cr, “ C K nf 6 1 X t0 bC the P r ° 8 ability of Lsclassification let 

about deViati ° n of 

Example 44,2 

In the data of Example 44 . 1 , 

= 66-917, X 2 = -38-424. 
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rcD theory OF STAT ISTI CS 

the APYANC bab iUty of a dev ia , ion ^ ^ 

Aid i cificati 011 lS : t h variance 105 341/2# — 1*053 41 

the error of n**3 distribution w ^ with standard deviation Thi, 

Hence the ^ a normal iatl0 n of« «a M 

from the ^ probability of 3 a » 

„ irs when all the correlations between «, ' 

°f nical work for the correlations to be m ke *'> 1 
"° l °l discriminate on two factors, size and °! e °t 


■: 




f'° mU ! e ieprobal 


is eq 


[iial to 


negligible. ia i case °^ CU ^/ W ork for the correlations to^e^ 6 

dfa,a:iI,,i,, * , * on two factors - size 


(44.22) 


An i ntere rnrnmon m 016 

Sn 0 t itude. If so, we can TC 

hi the same in magmm • that if the correlations are all equal to „,, 

as follows. 431, it may be . n by p % 

As in Example V > . ma trix are g 

latent mots of the correlation ^ „ lH p -l)f> (44.2c, 

. . (44.21) 

• therefore contains one major component, the rest being isotropic. ^ 

fl - VP f- 

e^H-innal to this and write 

, « cVp ” component proportional w 

We take a sue co p ^ = ^ = ^ ( 44 . 33 , 

so that varO-^t =f{l+(f>- 1 W- ( 44 . 24 ) 

Among the remaining components no one stands out in advance of the others. I 
Lem then take a set of weights », with non-zero mean and define a shape ” > 

P° nen * b ? • r w .-a • 

P= S - 1 — v,-. (44,25) 

(44.26) 


We find that 
Further, 


i=i w 


var - p=s ( ! V ? ) 


cov(0,P) = covfs?L^v, 
V w J 




^Wj-W „ „ 

= -varx-+ S S 




j ^ 


ZVj-ZV 

w 


cov (.v,, * t .) 


= {l4-(f-l)p} s3~g 

j W 

The sire and shape components are then uncorrelated. 
UM T ° arrfve “ a discriminator we take 
^ 1001 f ° r a discriminator of foj = 

^ = «(?+P, 


(44.27) 

(44.28) 

(44.29) 
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DISCRIMINATION AND CLASSIFICATION 
(^i-^)VvarX 

P.-ft, Dq = Oi-0« we have then to maximize 

_ {^Dq + PpY 

a 2 var Q + 2a cov (Q,P)+ _ varP' 

W e find easily the solution 

D q var P 
D P var Q‘ 

muting now the weights from (44.28) in the expressions for D P and D 



(44.30) 


Cp=i>S ( *». 

x x — x z v 2 ' 

(x x —x 2 ) 2 v - 1 2/ 


Dq = p{x x -X 2 ) 
var 0 =p{l4-(p-l)p}. 
Substitution in (44.31) then gives a and we find 

x= i&* B+p - 


(44.31) 
q we have 

(44.32) 

(44.33) 

(44.34) 

(44.35) 

(44.36) 


■ Sample 44.3 

Consider again the Iris data of Examples 44.1 and 44.2. The correlation matrix is: 



Xl 

^2 

*3 l 

Xi 

X l 

1 * 

•599,513 

•636,323 

•472,011 

#2 


1 * 

•382,719 

•457,988 

*3 



1* 

•705,258 

Xi 




I 1 ’ 


(44.37) 


The correlations are near enough to equality to justify the use of the foregoing 
an approximation. We reduce the variables to zero means and unit variances to p : 


as 



Ve 1 

5 j 

Ve - S 

x t 

1-0628 

-1-0628 

2-1256 

x 2 

-0-9551 

0-9551 

-1-9102 

x 3 

3-9894 

-3-9894 

7-9788 

Xl 

3-4426 

-3-4426 

6-8852 

Sum = Q 

7-5397 

-7-5397 

15-0794 = Do 

1 


(44.38) 


Tli 

given by Vat " lanCe ® * s ca ^ cu i ate d as the sum of the 16 elements in (44.37) and is 

var Q = 10*5076. (44.39) 
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MrED THEORY 

THE ADVAN ^ ^ (44.38) by dividing the l ast ^ 


The w e ig h “"f b tt a cd n g Puni ^’ ”“^5067, H 165 - °' 826+ ’ 

5*0794) and sub __ 0 -4362, " 

L the estate of ( fcd 0-4362xl-062B) + etc. 

= 8-2747 



0912 ' 4l) ' 

and again by using (4+- > var ^ ^nish, and we calculate it t rom (A 

n f n and P doea n • gg a = 0-2412 and our discrim; ’ 3 ^ 55 

The covariance of U. ^ the n gives « ’“""“"‘Mo, { 

0-36162. Substitution 0-24120 + ^. ( 44 . 43 , 

, . n is ,he sum of the * s m standard measure and n 1 


Quadratic discriminators depends on the assumption ^ 

44.11 The linear drscri ‘ 1 dispersions. If this is not so our 

populations under a „d ignoring constants, ° Ur '°J 

Ukelihood (44 

• v nn i on2 er cancel, and our boundary becomes a quadric in ' 
in general, an awkward construct to handle, which prob abl 
accounts for the fact that quadratic drscrrminators have not come into general J. 

We can make some progress if we reduce the situation to one of size and shape, j 
We now have p = 2 and the covariance terms vanish. Expression (44.44) then reduces ft 

to a form of type 

(y-v i) 2 + {x-v 2 y ( 44 . 45 ) 

where y and x are linear functions of P and 0 and the variances are calculable. The 
discriminating boundary then becomes an ellipse (in two dimensions) and the situation 
is tractable. Reference may be made to C. A. B. Smith (1947) for an example 


Testing of a discriminant function 

44.12 The process of testing a discriminator needs a little clarification. We 
may suspect that there is a real difference between the populations but that they are 

rf fT’T is not ver 7 effective ; ‘his is measured by the errors 

that there is a lareer diff ’ ° U ? minlma i' may still be large. Or we may think 

large mough to Sue!7“ P°P ulati ° ns ’ but out sample sire is no. 

confidence intervals to the ST **' dlscrlm mator; this is really a matter of setting 
are identical and that a dkcnm° n ° l f S coe ® c ^ ents - Or we may fear that the parents 

Tests of discrimLant W i§ iUusor y- 

°f these possibilities. Thev laVe Usua ^ ^ een discussed in terms of the last 

» use of the function 5° muc ^ tests of the functions as tests of homo- i 
S 1 cant in the sense that it dkr * ?^ ero § ene Ry is found, the function, ipso fort 0 * f 

nmmates between real differences in an optimal 
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’^vatoes? But that waylay not beTeTyeo"^ means instead of the unknown 
P irent y ot De ve, 7 S°°d even if it is the best available. 

^ 2 (44.46) 

The term is the difference of two mean-? nn u ,. , ., , 

therefore distributed like a mean about zero Jhh't H u ? Y dlstrlbuted ’ and 1S 
tn , , UU1; zero with twice the variance of a si no-ip mean 

if the sample sizes are the same. It follows—rf 41 17 «. rf • j- f f 

Hotelling’s ry(2n-l), based on 2n observations ' Th ,7_that .^ 18 d ‘stnbuted ® 
bution of the multiple correlation R* when Rt - n bv C 4 i 1S »Ti equlvalent t0 tbe dlstn " 
out by an analysis of variance. It seems orefTrl^ b ’’“ d a **t C “ be Carned 
the manner of Chapter 42, which enables us to r VI 0Vf ^ r> t0 test homogeneity in 
persions separately HeS US ‘° Conslder fences in means and dis- 

44.14 Since we observed in 27.28-9 tW n 

require the multinormality assumption it « no ™ ^ str ^ butlon of ^ 2 d °es not 

populations does not require it either.’ Exercise 4410 sh ^ dlscnminatlon with two 
boundary (44.9) from a LS analysis. 44 ' 10 h h ° W WC may denVe the 

not equally important case ® where the two types of error are 

modification: Y whlch our P revious results may require 

(a) It may be known that members from population 1 w a- a 

those of population 2 of being chosen ? For *T . a dl ^ erent chance from 

individuals at random to see if thev have art 1 ♦ exam P le >. ln selecting a batch of 

times more heahhy * «nd many 

(b) The consequences of misallocation mav be seriomlv rlifF^ +. T . 

to diagnose a healthy person as unhealthv ('her Y , fferent ' lt 1S les s dangerous 

covered before serious ha™ HoneT hint b , b “*** i$ Ukely t0 be d *‘ 
the reverse may be true). unhealthy person as healthy (where 

populations are ji, a^J^ JT-t,)” LefusTuT^ f T mhm from ou r two 

numericai weights to mistakes, a misclassification coEus“"and fu T ““ attaCh 

Mead of now minimising mistakes in number we mfnimhe cos? Tb reSpeCtIV J e1 !'- 
1^.3) we have to minimize m ize cost. Then instead of 


This i 


(44.47) 


is minimized when the boundary is determined by 

^2^2/2. 1 


Thus if 


c i ^1 /1 


*e work with log f Jh as dhcriminator the effect of introducing 


(44.48) 
the prior 
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the AH fe i v to add a constant to the Hi 

criticaI value by ******** 



«ffe copulations discrimination between two P op uIat : 

Tbe mt When we proceed f ro ®p ulat ions an essenttally new point ap S '» dj, 
44-16 „mong a number ot P P he sample space into mutual, L 

Semgion “ falls- we must, to achieve optimal prop^ b) 

^single discriminant M 2hav e , sing ,e function, we shall have to sae^'S 
several functions, or, U w ^ 

discriminatory power. 

44,7 I« will be enough£ 

_ th e generalisation to^ *£»*£*** °f *o three *C°S 

posing that the P™ b "“ ectively *, * (*+»•+•* =0- « *e cor resp J 

functions are/,,/* A a Generalization by C. R- Rao of the Neyman-Pearson ^,5 

;r IT the errors 2 ()f misclas^fication^are^minimum ^ e regions^ dete^ 

,h TfftTreTerthl or equal to both «./,and */,; R, is such that 

4418 In particular, if the three populations are normal with common dis persion 

• A .one ,/ // • it follows as in the manner of 44.5 that R, mmt , 

matrix y jk anC4 means /%, Hv r*v 1 i inus t be 

such that , \ . 0 

% Fjkiftij—fA 2 j) x ic^Pjz> say, (44.49) 

S Yjk{^\j~l*zj) x k^PiZ’ sa y- (44.50) 

Similarly for the other regions. In the sample, R\ will be determined as the domain 
lying between the two hyperplanes (44.49) and (44.50) and including the mean of 
population 1; and so on. The surfaces of constant weighted probability ratio for 
populations 1 and 2 are, in fact, given by 

log^4 = 2 ^(^-^K-P T jk (ft ljf i 2 ,c~/.i 2j fi i,.) + log njn 2 . (44.51) 

n i n 

In the particular case where all the ns are equal we may compare the three functions 

X x = Yt Tjkfiy x lc — p 1 jkfhjfhk (44.52) 

** = S I**- p 1W 

ii = ir^-pr^ to (44.54) 

an a ot a membei to i? x , i? 2 , i? 3 according to which of the X’s is the greatest when 
the sample values are substituted. For if, say, X, is the greatest, it follows from (44.51) 

Da r a mp+pr! ^ Jv > z ' ^ usua l we may substitute sample values for the unknown 
in ese equations to get an approximate discriminator. 


/ 
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DISCRIMINATION AND CLASSIFICATION 

( C R. R a0 and Slater> 1949 ) 

pfe ^ f persons falling into certain neurotic groups obtained the following 

<^ i 0 Vee-“- 



I 







^Xiety state 

^hopathy 

gSSSf Change 
Normal 



Mean Score 
2 


2- 9298 

3- 0303 

3- 8125 

4- 7059 
1-4000 
0-6000 


1-1667 

1-2424 

1-8438 

1-5882 

0-2000 

0-1455 


0-7281 

0-5455 

0-8125 

1-1176 

0-0000 

0-2182 


(44.55) 




.. ^rsion matrix within groups (250 d.fr.) was 
The disp ersiu 

-— ; ; r 


f 


Its inverse is 



1 

2 1 

3 

1 

2 

3 

2-300,851 | 

0-251,578 

0-607,466 

0-474,169 

0-035,774 

0-595,094 (44.56) 



1 

2 

1 3 

1 

2 

3 

0-543,234 

1 

-0-200,195 
| 1-725,807 

-0-420,813 

0-055,767 

2-012,357 (44.57) 


For the purposes of this example we will suppose all the it’s to be equal. The six 
discriminating functions of type (44.49) are then as follows: 





Coefficients 

Xx #2 *3 

Constant 

Normal 

Personality change 
Anxiety state 
Hysteria 
Psychopathy 
Obsession 

0-2050 

0-7204 

1-0515 

1-1678 

1-3599 

1-7680 

0-1431 

0-0649 

1-4676 

1- 5679 

2- 4641 
1-8611 

0-1947 

-0-5780 

0-2974 

1 -0-1081 1 
0-1336 
0-3573 

-0-0931 

-0-5107 

-2-5047 

I -2-7139 
-4-9182 
-5-8375 


(44.58) 


Here the coefficient of aq for the normal state is 

(0-543,234) (0-6000) - (0-200,195) (0-1455) + (- 0-420,813) (0-2182) = 


i u 
































































n Wc > n 2jk 
n i < n, ■ 


( 44 . 61 ) 


44.21 


T’bis seems a crude method of procedure, b ut i 


lNCED theory of statistics 

the ADVANCE es 1,1,0. The values of 

26 n i e W e had a s “ b ^ n. 2746 ,0'014+,0-0218, -1-0942, _ 2 ° f >V 

Suppose, f° r X o P f(k58),^ 0 ' 255 J'personality change. In practice, 0 f Co 2 ° 8 <- 
unctions, in the or i second[group* P is very close and there are Se > 

?££ i rS^chS^ * h sample discri ^ 

ive members in the pers 

)aSed ' . : nt the discriminating functions represents. 

4419 From the geometric 1 vie " P ^4 ’ of cou rse, planes. As we have*'!' 
J^Y* dimensions. Those f ^ of distribu.rons. When w e >' 

they are not orthogonal to the “ w ill not, in general, be colhnear. We m ; gh ' 

more than two population* ^ m ^ ^ means , nd use vanation in the dir^ 
however, find the line of closest if we have k populations we may J 

of that line as a discriminator. Ana eel, 

for a function X given by , 

X = XljXj 

j= 1 

such that the ratio of —s Jfo 

S^Se'dr^lon matrix between classes and B the total, this is equivalent 
to maximizing . 7 , 

A - nfti (H59 > 

which leads to ^ (Ajk ^ikYk (44.60) 

Thus the largest latent root of | A-XB I = 0 provides our discriminator. For details M 
reference may be made to Bartlett (1951), E. J. Williams (1952), and Blackith (I960). 

It appears to us that, in general, the use of one function for discrimination among 
several populations may be rather Procrustean unless they are so separate that almost 
any method will yield reasonable results. 

Qualitative data 

44.20 Our discussion so far has been in terms of measured variables In 
practice we frequently have to deal with situations where some or all of the variables 

population 2 according as member in that sub-class to population 1 or 




quencies telUs *) are relevant. All the other class fre- /' 

nucrsnip ot that class. 




it is in line with the criterion 
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■ 


nfe d f° r measured variables; for equation (44 fin , 

** Member to the class for which it has the greater omb It ^ that we allocate 
m to take the matter further _.. 8 , Probability of occurrence 




If ^h » ‘ ake th “ fu « h « we 

5,17 utilize information from cells outside the (i h\th 11 might be Possible 

t0 0t been attempted. U#/?)th ‘ So as we know, this has 

. X Tf we are prepared to prescribe misclassification cost* n u 
( b ) rated discriminator can be set up on the criterion that a more sophisti- 

f f to minimize the cost of nusclassificatC Z l?T be f. ls t0 > a “ 


» Z to minimize the cost of misclassification ove/the wholetabl' ‘“p 6 aUocated 
Hopkins (1961) examine the procedure. See also Linhart (1959)'. C ° Chtim and 


® dea l with this situation A rather heuristic approach is to constmctTscme^from 
* e qualitative variables (e.g. by representing a dichotomy by 0. 1, a tritomy by 

-1. O' h etC - and a 7 eta S‘"S T r r aWeS) ^ then t0 use that •»« » measured 

variable in conjunction with the other measured variables. Alternatively, a separate 
discriminator can be constructed from the measured variables for each cell of the 
qualitative classification a tedious procedure and one which is apt to reduce the sample 
numbers for each discriminator to a very low point of reliability. The subject would 
repay further study. 


44.23 Before proceeding to consider distribution-free methods we deal briefly with 
a few points not yet discussed: (a) reserved judgement, (b) bias in the estimation of 
misclassification errors, (c) discarding redundant variables. 


Reserved judgement 

44.24 In many, perhaps most, problems in discrimination it is wise to allow for 
reserved judgement on borderline cases, and not to insist on an allocation to one of 
two classes. This means, in geometrical terms, that we wish to divide the sample 
space into three regions R lt R 2 and D n . If a member falls into R x we allocate it to 
population 1; if it falls into R 2 , to population 2. If it falls into Z) 12 we admit that the 
data are insufficient to make a satisfactory judgement. This region, in general, will 
contain members of both populations fairly intimately mixed up together, and in practice 
we should probably seek for some other criterion to disentangle them. 

It is not difficult to use the linear discriminator to set up the region D 12 . We 
merely have to decide on what misclassification probabilities are tolerable, define R t 
and R 2 in terms of them, and assign Z) 12 to the remainder of the sample space. 

44.25 With more than two populations the number of regions becomes more 
numerous. With three, for example, we may define regions of doubt D 12 , Z) 23 , D 31 
m terms of the three discriminants, but these will intersect. Thus we may have a 
region D m wherein we cannot allocate to any population; a region D 12 . 3 where we can 
reject R z but cannot allocate as between R± and R 2 \ and so on. No particular difficulty 
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least wi* lin p rob i e ms 


Hi 


Example 44.1. »’ pti „ns it is b ‘ l shou ld pro bably wish to f to ^ 

is* sensitive to is . P» f^ever, involve a small bias. “> an, 

Observed s®P u ch eck. I‘ m 

35 3 p rtice two sources of error m the empirical deter,* 

4427 There are, in &*** of a)I> w e do not know the parent parang 
of ,he misclassifieation erro • froro the sample^ On the average, our empj ri 

! discriminant is ‘“ han the true value. Second y, our emp,r ical es [> 

estimate of «*.'**$£* has been fitted. Cogently the em pmcal «** 
is derived from data to wh. ^ have be en had the discriminator been ap pI * 
will, on the average, be >e ^ ^ have seen , wou ld be greater than the true ,1' 
to a new sample; but tms w 


ie, 


Vvnmble 44.5 (Cochran and Hopkins, 1961) 

The following simple example will exhibit the effect 

Suppose we have two populations P , and f> 2 and a single variate which can take 
values a, and a. Let the true probabilities tha a member in P has the app r0 p riate 
values be a, Wand a, W = 1 -a, (a,), and similarlyfor sr 2 (+) andn 2 (« 2 ). If a samp | e 
of »x from Pi bears i\ values of a lt the unbiassed estimator of n x (d^) is T x /n x \ and 
forth. 


so 


The allocation rule will be to place a further observation a x in P x if the corresponding 
r l ln l >r 2 /n i , and to place an observation a 2 in P x if 1 —r x /n x > 1 ~r 2 /n 2 . 

Now consider the case when the true probabilities are given by 

Pi 


a x 

a 2 


0-9 

0-1 


x 2 

0-05 

0-95 


« 2 0-1 0-95 

Src vrjsi z! ”■ - h 'r"~ <™ 

P has value a,, the rule for the future is t L ?”* “ P1 haS , Value a r anl 


but will suffice to make the Doint Wf ■ VTT^ lrtLluu * V 1IUi > » vt 

value the rule for thp f ^ • , ** tbe one ln A has value a x and that 

to P h and every one with a is tn 6Very Nervation with a x is to be allocat 

probability is zero. The actual /k ! & ° C l ate< ^ to A* The estimated misrlassificati 
Likewise if the vain. < but unobserved) orobahilitv io IrtU 


is is very 
in 

allocated 


probability is zero. The actual fK + & ° < ? ate< ^ to The estimated misclassifieation 
Likewise if the value from P, i s i a „ . U “ bs f ved ) Probability is J(0 1 +0 05) = 0-075. 

Jo-925 6 eSt "" ated ’’’'classification probabiht" ^ - S decision rule is revc ff d 


ty is zero. Actuallv it is AfO-9+0’95) 


J 
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reserve judgement or toss up 
d actual probabilities of mis- 


we have were chosen at random 

tea _'V, • ,— 


ies. The results are: 


Occurrence Prob. of occurrence 


Prob. of misclassification 
Estimated Actual 



Pi(«a) Pa 6 * 2 ) (-1X-05) = 0 005 


00 

0-5 

0-5 

00 


0075 

0-500 

0-500 

0-925 


0-07 


0-13875 


This, of course, is a very extreme case. In sample sizes likely to be worth dis- 
1 cussing in practice the bias is much smaller. Further discussion is given by Cochran 
I a nd Hopkins (1961). See also John (1961). For a much more comprehensive dis- 

I cU ssion of error rates see Hills (1966). 

If, of course, the initial sample on which we base our discriminator was not chosen 
: at random, no quantitative estimate of bias, in general, is possible 

I I 

Redundant variables: standard errors 

44.28 It is natural to enquire whether all the variables x which appear in our 
4 discriminator are necessary. One expects that discarding variables will weaken the 
discriminatory power, but the loss may be negligible. Looked at from the geometrical 
! viewpoint, if our constellation of points in a ^-dimensional space is satisfactorily divided 
into two by the discriminating hyperplane, the same may be true if we project on to 
one of the co-ordinate hyperplanes, in which case the variable orthogonal to that plane 
is redundant. 

There are several ways of approaching this problem. It would save a good deal 
of trouble if we could discard unrewarding variables at the outset without bringing 
them into the analysis. This, however, is a hazardous operation—cf. some results of 
Cochran in Exercises 44.6-9. A more direct approach would be to estimate the 
misallocation errors by omitting certain variables, but this is apt to be tedious if the 
number of variables is large. We will consider a third approach by deriving the standard 
error (in large samples) of the coefficients in the linear discriminant. 

The coefficient l k is given by l h = 2 Cj k (x X j-x 2] ). 


t Likewise 
Hence 


Hence 


dl k = 2 {(icy - x^)dCj k + C jlc d(x X j - x 2k )}. 
3 

dl m = 2 {{x lr -x 2r )dC m + C m d(x lr -x 2r ). 


(44.62) 

(44.63) 
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„ sar e independent of dispersions for normal variation if 

Remembering that means are i P (C *** * 

cov (4,4.) - + c# c mC ov {(«,-*)«*-*»]. 


If the two samples are based on * C ov (x lj ,x lr ) + cov(x 2J ,x 2r 

C0y{(x ir x 2j )(xir-^r)) V * 


observations we easily find 


^•6S) 


) 


4—J Cjr 
n x n 2 J 


(44.66) 


Let 
Then 


s of in M80 that c * - -wi «i. 

d\c\ 


dC jk = 


r )k 


dr, 


\ c 


jk 


\ c 

\C\“a,P afidCafi ' \c\cc,p 
where T,,. ad is the co-factor of c aP in T jk , 


— _ Jijk 2 r aL gdc a ff + —7 Tjfaap dc a p, 


= y—-j-r S {— I/ft r a/ s+ | £ | FyA;, a /?} dc a p. 


cr «,/» 


Now in virtue of Jacobi’s theorem on determinants 

I c I Fjk,<xP = ^a/5 — 

Hence (44.67) reduces to 


^ = 


c 1 2 «, p 


^ t'ifl ^!fc« dcop 


S Cjp C] xC0L dc aP . 


a, p 


We now find, on using (41.98), that 


1 


cov (Cjfo C rm ) — (Cj m C Jcr 4- Cj r C km ). 


til ^2 


Substituting from (44.66) and (44.69) in (44.65), we have 
cov(4,/J = S C jk C rm c Jr 


+ 


1 


n 2 i, r 


2 (i jy x%j) (x lr x 2r ) (C jm C kr + C, r C Jcm ) 


Cl + n) C,nk+ n x + n 2 ^ ^ x tr) C jm C, 


Icr 


+ ■ 


n x +n. 


Ckm 2 (#y - f 2y ) (£ lr - x 2) .) C, 


J> r 


Or 


= (^) C m’c + ~ 
\ n i J h/ n x + 


1 


n 4_« 4t4"b' 

"i + m 2 /Zi + 


^Vcm ^ 4 (*^ir ^2r) 


' U + J C « + »TT^ (*, - *,). 


; \ 




(44.67) 




(44.68) 




(44.69) 




(44.70) 


1 


(44.71) 
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(44.72) 


again the Iris data of Example 44.1 and let us test l v 
Cons lder «! = »« = 50, = -3-115,11, 


We have 


X x -V 2 = 105-341, C n = 11-871,61. 


W e 


a from (44.72) 

find var k = 0-4749+0-0970 + 12-5057 = 13-0776, s.e.(Z x ) = 3-62. 

1 te value is less than the standard error, and we should consider whether x x 

i eoriDiic Incc nf ^Icrriminatinrp rirvixrAr 


the aDS ar( j e( j w ithout serious loss of discriminating power. 

can be . r t as we shall see later, a good discriminator can be based on ac 4 alone. 

In p° int 01 ’ 

Cochran and Bliss (1948) considered the case where the effect of some variables is 
cted by a covariance technique, and discrimination applied to the remainder. For 
l? St method and a worked example, reference may be made to their paper. 
t ” e p or t he use of the D 2 statistic and discrimination generally see C. R. Rao (1952). 

, -u^ion-free methods 

DlS 29 We proceed to discuss the possibility of distribution-free methods of dis- 
• •' tion for k populations. Very little work has been done on this subject, and the 
niowhig sections 44.29“33 should be regarded as suggestions which need to stand 

1 t he°test of experience. . . . . . 

" ^ et u$ re vert to the representation of members as points in a ^-dimensional space 





] whose co-ordinates are the values of the variables ffj, # 2 , . . . , x p . Confining ourselves 

< for the present to two populations, we may think of one population (say A) as represented 

by crosses and the other (say B ) by circles. In two dimensions the picture might 
look like Fig. 44.1. The crosses have a convex hull which we have drawn in; likewise 
for the circles. In general these two will have a common domain. 












0-0109). That of versicolor has 
0-0383). On this showing, as we have already remarked, petal width would be- 
perfectly good discriminator in itself. If we allot a new member to setosa or versicolor 
according as petal width is less than or exceeds, say, 0-9, we shall rarely make a mistake 
even if the variates are normal. 


44.31 The method we propose may be illustrated on the discrimination of versi- 

color against virginica. A casual inspection of the data shows what can be confirmed 

by tabulation, that the two differ more on petal length PL and petal width PW than 

on sepal length or width. We form a frequency distribution for PL and PW as in 
lable 44.2. 

We observe that on PL the two distributions overlap in the range 4-5-51. Outside 

overling !£“* f„Tf ° f vmicohr and 34 cases of virginica. On PW there is 
The total of * ran ®f. ’ 33 cases of versicolor and 34 of virginica lying outside it. 

we sh lT tlTo °T dethe C ° mmm b ™g 63 for PL and 62 forPW, 
shal take as our first discriminating variable PL. 

we then lay down the following rule of discrimination: 

PL <4-4 allot to versicolor 

PL ^5-2 allot to virginica 

There are t? r 4 ' 5 < PL <51 refer to next variable. (44-73) 

cases out of Table 44 T** ^ beS * n tbe common range 4-5—5-1. We take these 

Table 44.3. ^ COnstruct a distribution for them in respect of PW, 


H 
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Let us consider the following rule of discrimination: 

„ . r ii lKp /4-hull but not in the H-hull we assiw™ 

S i 1 n v * ^ hul1 ^ t * 

(c) If the point falls into both hulls we will not assign it to either. ° 

The proposal is plausible but we shall not follow it up for three ^ 

(i) The determination of the convex hulls is a problem in linear pro gra 

is soluble but takes us outside our present scope; ^Hk 

(ii) The method gives no guide to the treatment of new points which fall ou ts i d 

hulls; . . . , . 

(iii) The method is not truly distribution-free, because non-linear variate tran«f 
tions do not preserve the planarity of the hull boundaries. St ° r «% 

A count of points in the two hulls and their common part is nevertheless us f 
giving us a measure of the degree of entanglement of the two populations—a m ^ as 
so to speak, of the magnitude of the discrimination problem. easur e, 

44.30 As a prelude to a distribution-free method, consider again Table 44 1 d 
for setosa and versicolor. ■ * a 

The petal width of setosa has a mean value of 0-246 and a range of 0-2-0-6 (vari 

a mean of 1-326 and a range of 1-0 to 1-8 (variaiT 


as in 


--— 
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Table 44.2—Frequency 


Variate 

values 

Petal length j 

Vers. Virg. 

Variate 

values 

Petal width 

Vers. Virg. 

4-3 

25 




— 

4-4 

4 


10 

7 


4-5 

7 

1 

ii 

3 


4-6 

3 

- 

1*2 

5 


4-7 

5 

- 

1*3 

13 


4-8 

2 

2 

1*4 

7 

1 

4-9 

2 

3 

1*5 

10 

2 

5-0 

1 

3 

1*6 

3 

1 

51 

1 

7 1 

1*7 

1 

i 

5-2 


2 

1*8 

1 

n 

5-3 


2 

1*9 


5 

5-4 


2 

2*0 


6 

5-5 


3 

2*1 


6 

5-6 


6 

2*2 


3 

5-7 


3 

2*3 


8 

5-8 


3 

2*4 


3 

5*9 


13 

2*5 


3 


50 

50 

\ 

1 

50 

1 

50 


333 




Table 44.3—Frequency distribution of 37 cases not distinguished by PL 


Variate values 

Petal width 

Vers . Virg . 

1*2 

i 


1*3 

2 


1*4 

4 


1*5 

9 

2 

1*6 

3 

_ 

1*7 

1 

1 

1*8 

1 

5 

1*9 


3 

2*0 


3 

2*1 


— 

2*2 


— 

2*3 


1 

2*4 


1 


21 

1 16 


Proceeding as before, we see that there is a common range for PW of 1*5-1 *8. We 
therefore add to the rule (44.73): 

4-5<PL<5-l 

PW <1-4 allot to versicolor 
PW ^ T9 allot to virginica 

1*5<PW<T8 proceed to n-fext variable. (44.74) 
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334 . -J..J pw has discriminated 63 cases and t, 

,,^^"2undecided cases on sepal .ength SL andsep al ^ 

distributions of 22 cases not distinguished by Pl a] 


Table 44.4-Frequency 


Variate 

values 


4- 9 

54 

5- 5 
5-6 
5-7 
5-8 

5- 9 
60 

6 - 1 
6-2 
6-3 
64 
6-5 
6-6 
6-7 
6-8 
6-9 


Sepal length 
Vers. Virg. 


Sepal width 
Vers. Virg. 



1 


1 

3 

1 

2 

1 

1 


1 

2 

1 

1 

2 


1 

1 

1 

3 

2 

2 

1 

1 


1 

2 


14 


14 


Table 44.5—Distribution of 16 cases not distinguished by PL, PW, SW 


Variate values 

Sepal length 

Vers. Virg . 

4-9 


1 

— 

— 


54 

i 

_ 

5-5 

— 

_ 

5-6 

i 

_ 

5-7 

_ 

_ 

5-8 

_ 


5-9 

_ 

1 

6-0 

2 

2 

61 

__ 

1 

6-2 

1 

1 

2 

6-3 

1 

64 

_, 


6-5 

1 


6-6 



6-7 

1 

- 

---- 1 

8 

8 


SW there are 6 . We thereforTtaL^W^ 14 lymg ou | side the common range. 

as our next discriminator and add to ( 
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discrimination and classification 

4-5^PL <5-1 

1«FW<1*8 

SW>3'1 allot to versicolor 

SW< 3T proceed ,0 next variable. (4475 , 

third variable discriminates a further 6, mailing 84 altosretW 1 ' , 

"Tded F° r theSC “ * he dlstribution <Mt SL is given in Table 44 5 
i^ cld ;<• i R worth we may now add to (44.75) 44,5 ' 



what it is 


for 


4-5^PL^5-l 
1-5 ^PW^l-8 
SW<3-1 

SL>6-4 allot to versicolor 
SL ^ 5 - 3 allot to virginica 
5-4 6-3 undecided. (44.76) 

This leaves us with 87 cases decided and 13 undecided. No further discrimination 


is P 0S1 


isible. 


'a 


44.32 The general method will now be clear. It is completely distribution-free, 
depending only on the rank order of the variate values. It brings up one by one the 
variables which are prima facie most important in the discrimination. It involves no 
arithmetic other than counting. 

On the other hand, the discrimination which results is not necessarily optimal. 
Looked at from the geometrical viewpoint, instead of a plane boundary as in Example 
44.1, we have a step-wise boundary. The discrimination on the first variable rules off 
three domains by hyperplanes orthogonal to that variable. The second variable rules 
off similarly in the region of indecision left by the first; and so on. It is possible 
that an optimal method based on distributions may leave a smaller residuum of un¬ 
decided cases than the one we propose; but it can do so, of course, only at the expense 
of sacrificing the distribution-free nature of the procedure. 





Differences in dispersion 

44.33 We may add a final word on the problem of discrimination when populations 
differ in dispersion but not in means. It is easier to point to the problem than to 
suggest a solution. Consider, for example, Fig. 44.2, where the populations have the 
same mean but different dispersions. There are clearly areas where discrimination is 
possible, but the foregoing methods fail to reveal them. 

If the configuration was the same but the figure was rotated through 45 degrees, 
we should arrive at meaningful results by the rank order method. The lines 
& ,LL' would rule off domains outside of which crosses were, so to speak, dominant 
and liaise MM\ NN' would define domains for the circles. The rectangle in the 
mi ddle would be a zone of indecision, which would inevitably be large owing to the 
n ature of the data. 

^ heuristic procedure in such cases would be to rotate the axes (for measurable 
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Fig. 44.2 (see text) 


variables), say, by a transformation to principal components. Lubischew (1962) has 
discussed the problem in a biological context. V 


44.34 One difficulty of the foregoing method stems from its sensitivity to outlvinv 
values. As we have explained it. only non-overlapping regions are accepted for dis- * 
crimination; for variables of effectively infinite range there tends to be more overlaD as 
sample size increases It might, therefore, be preferable to accept some misclassifica- 
tion from the outset by permitting overlap up to a specified amount; or to fit univariate 
distributions and estimate the cut-off points to a specified degree of overlap. Much 
more remains to be done in this field. 1 


Classification 

* correspLing * ** - 

W ffiXe’ rr al ' fVX" of observations, let us consider the n sample points 
poims * 1 “ Euclidean space determined by the p variables. ? If these 
we may " T. d ® fin ; tIon - fal1 into clearly distinguishable groups, 

“nearness” ism b, h ” “ dlv > duals m ay be classified into those groups. Their 
(b) In the alternative * sn° nS ' 'X/j j funct,on of the variate values which they bear, 
vectors^ There is^l^ e ” bedded In an «P«e the variables are represented by 
in caLi J^alvsis L'T' 51 “ h ° W f * r these ~ cluster, as we have seen 

variables cluster,^not the inDduak 315 C ° nCemed W ‘ th the extent t0 which the 
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^ v0 uld be convenient, though it is not general practice, to refer to the first type as 
7 ana ty s * s an< ^ secon d as cluster analysis. In the first we accept the 
cW'P an d try to classify individuals; in the second, which is perhaps logically anterior 
varl j* first, we are interested in the variables, to see, for example, whether they are all 
t0 essary and which are the more important for the purpose in hand. 


44.36 In either case our primary difficulty is to define what we mean by “ group ” 

“ cluster.” There are several ways of doing so, but they all rest on the notion of 
°J earne ss ” or “ distance.” The consequence is that we have to set up some kind 
/metric to determine the distance between two points, and then decide on a distance 
vithin which two points are “ near.” 

W For cluster analysis an obvious distance function of x i and x k is the correlation p jk . 

can regard this either as the cosine of the angle between the vectors or as the cosine 
of the distance between the end-points of the vectors on the unit hypersphere. The 
correlation matrix p then sets up our distance function. We have only to decide what 
values constitute nearness and how we use them to define a cluster. 


44.37 Suppose we decide that points with p^0-7 are near together. One manner 
of procedure is then as follows: scan the correlation matrix for pairs with correlation 
^0-7. If there are none, no cluster exists. In the contrary case take one pair, say 
x . t x k . Examine the correlations of other variables with these two. If there is an x t 
such that the average correlation (three values) between Xj,x k ,x l is >0-7 add x t to the 
cluster. Proceed if possible to find a fourth such that the average p (6 values) >0-7; 
and so on until the process fails. The resulting vectors are a cluster. Putting these 
on one side, repeat the procedure with the remaining variables; and so on until the set 
is exhausted. 

The procedure is fairly easy to apply for a number of vectors of reasonable size— 
and in practice the number rarely exceeds 50. But it may not be unique in the sense 
that where we have a choice of starting pairs the ultimate result may depend on which 
we choose. If computational facilities were available, it might be possible to split 
\hep vectors into groups in all possible ways, the number of non-unitary partitions of p, 
and examine the clustering within each partition. But this would probably overtax 
the capacity of the largest computer. 

For some further studies see Tryon (1939) and Fortier and Solomon (1966). The 
methods of cluster analysis have not been much used by statisticians and are worthy 
of further study, for example in the discarding of redundant variables in regression analysis, 
structure analysis, discriminant analysis and, indeed, in multivariate analysis generally. 
It must be remembered that correlation coefficients are quantities of a highly summary 
land, and it is prudent, as a preliminary in all these cases, to draw some of the bivariate 
scatter diagrams in order to get an overall view of the nature of the variation. 


44.38 The method of cluster analysis by correlations has the advantage of being 
independent of the scale of measurement of any particular Xj. We are, so to speak, 
concerned with the number of components^), not the way in which any one is measured. 
Such a method is not distribution-free, but if any worry is felt about the non-normality 
°f the data, the original variables can be replaced by ranks and the correlation procedure 
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... a ppiies. In fact, we can extend the method to cover qualitative data, provided t-k 
the categories in which they occur are orderable (see, for example, 33.36, Vol. 2 ). qJ at 
metric, one might claim, is a natural one. he 

But when we consider the grouping problem of n points in a ^-space such c 
siderations no longer apply. “ Distances ” may be greatly, affected by altering T 
scale of one of the variables and, indeed, can be assigned almost any values we like ^ 
stretching scales in the right way. Sometimes the difficulty can be overcome, or ^ 
least partially met, by an initial standardization. We prefer, however, a distribution 
free method based on ranks. 


44.39 We shall now set up a distance function, not between variables but between 
individuals. Thus, if the variate values of the/th and kth members are x 1} -, x 2; -,.. x 
and x 1Jc , x 2k , . . ., x pk , we require a measure of correlation between them. To compute 
a correlation based on the values as they stand would be nugatory; for example, changing 
the sign of one vector variable would alter the value of product-moment correlations 
We therefore replace the n values of any component x } - by a set of ranks from 1 to n 
These ranks may be tied for any set of members exhibiting the same value of x.; and 
in particular qualitative data in ordered categories may be regarded as tied rankings 
so that our method has a very general application. The pxn matrix of rank values 
typified by r^, a = 1 , 2 , . . . , p; p = 1 , 2 , . . . , n replaces the original matrix. 

For each of the pairs of sample members we calculate the function, analogous to a 
chi-squared measure, 

D }k = I fccS*. (44.77) 

a=i var r a v > 

The variance of a set of n ranks depends on the number of ties present. If there are 

ties of t u t 2) .. ., etc. members, we have 

Var r “ = I Yn {( nZ ~ n )-W-t)}' (44.78) 

(Cf. Exercise 44.4.) 


44.40 A practical difficulty, as in most classification procedures, arises from the 
number of pairs which can be chosen from n members, namely \n{ii- 1 ). Thus, for 
a sample of 100 there are 4950 pairs, each with a value of D jk . To proceed in the 
manner of 44.37 and form groups by adding one member at a time is a sufficiently 
complicated exercise to require a computer; but it presents no theoretical problems. 


Example 44.7 

A heuristic procedure which gives at least a preliminary idea of the extent of 1 
grouping may be illustrated on some of the data of Table 44.1 (Kendall, 1966). 
le was constructed from the figures for versicolor and virginica by regarding thi 

1 all luZ ° f 100 ,°f unk r n 0rigin - The variable^sepAn f 

split imo four catw”' "" T “ tke ordlnar y wa y from 1 to 100. Petal length v 
spht rnto four categories, values < 4, s, 4 and < 5, Jr 5 and < 6 , and > 6 . Petal wic 
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ipnsed into two categories <2 and >2—a verv .• , 

i..« of D jt were computed and used to sort the 100 memtf The 

m well-defined - -• • . . 1U0 mem bers into classes. 


wV 


\W S nines 01 V jk wixi^LLLCLi aim used to sort the inn u • , - 

t,5 fhe data gave two well-defined classes comprising 58 (4) and 25 (B)TembersTone 
, lent overlapping. Of the 17 remamng there was another group round a firXr 

r--V*- i7, jbs 


“T: 


«*£ with A It was therefore decided to amalgamate 4“ 7 * 

gt»“P defined class. remalmng 8 dld not faU lnt ° a 

clC There was thus fairly clear evidence of two classes, and only two. But whereas B 
-detained only versicolor, A contained 48 virginica and 19 versicolor. On this basis 
c should correctly arrive at the number of classes, but misclassify 19 and leave 8 
doubtful. In the analogous problem of discrimination we decided 87 cases and left 
13 doubtful. However in the present example, we sacrificed a good deal of information 
by grouping petal length and petal width, so the results are not discordant. Reference 
ma y be made to Kendall (1966) for details. 


44.41 The subject is far from being exhausted. There are several ways of dividing 
members into groups, even when a suitable distance metric has been decided upon. 
For example, it is possible to consider a classification based on the intra-group distance 
in relation to the between-group distance. Wald (1944) proposed a statistic of this 
kind which is closely akin to the discriminant function. 

As a final comment, we would remark that, under the influence of the papers by 
Fisher (1936) and Wald (1944), statisticians have tended to approach the problems of 
discrimination and classification by looking for a single function of the variables. This 
appears to us to be a procedure which, in many circumstances, may be too restrictive. 
What is required is an allocation rule or set of rules; and this may or may not make 
use of a linear function, or a single function, of the variables. 


EXERCISES 

44.1 Taking x 1 and x 2 as sepal length and sepal width, respectively, for the data of Ins setosa 
and versicolor in Example 44.1, show that the linear discriminant function is 

—1-236 x 2 

and that this is nearly as good as the four-variable discriminator of the example. 


44.2 Show that the discriminating boundary given by (44.9) may be written 
£7 = x' V- 1 (Up - (x 2 ) -1 (p-i -|- |x 2 y V- 1 (Pi - p 2 ) = 0 
where V is the parental dispersion matrix. 

Show that if x is distributed as N([i i,V), U has mean equal to 

4 (^ 1 “ Ha)' V- 1 (tx x - ^ 2 ) = ioc, say, 

and variance a. 

If x is distributed according to JV(p 2 , V)> show that U has mean -|a and variance a. 

(T. W. Anderson, 1958. The distribution when sample 
estimates are inserted for p. and V is very complicated 
but asymptotically is the same. See Wald (1944), 
Sitgreaves (1952), T. W. Anderson (1951), and John 
(1961).) 
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v , are drawn from population 1 which is N(y. u V). * H| * 

44.3 a;, *u, *«i» • • n A which i s M|x 2 , V). Consider this against the alternative h,' * > *n , 
are drawn from popu. a ] rawn f rom population 1 and x, x 12 , . . . , x, h2 from population 2 P °2S 
S "likelihood ratio for testing the composite hypothests is • SU* 

l+^rCx-iO'Y-Hx-^ 


14* ~—~i (x-xi) v_1 ( x_ *i) 

(T. W. Anderson, 195 ^ 

44.4 In a set of 11 ranks the ranks pk+u pk+ 2 , . • • pk+t are tied and allotted the 
pk+b(t+l). Show that their sum of squares is reduced by Tt(t 3 — t). Deduce the forrtf 1 
for the variance of a ranking with t lf t 2 , ... , tm ties. 

var r — —— i(n 3 — ri) — 2 — 

12 n { j =1 J 

44.5 The data for versicolor and virginica may be classified by petal length and petal w'rl k 

in the following ordered contingency table. (For petal width “ small ” means <1*5; f or 1 , 
length “small” means <4-0, “medium” means ^4 and <5, “large” means ^5* pj Cta 
to the left of the colon refer to versicolor, those to the right to virginica.) ® Ures 


Petal width 


Petal 

length 



Small Large 

Totals 

Small 

Medium 

Large 

11:0 0: 0 
24:0 13: 6 

0:1 2:43 

11: 0 
37: 6 
.2:44 

Totals 

35:1 15:49 

50:50 


which i, falls, the probably » ““ f c'^l" 

meaning and reliability of this figure. mated at 8 per cent. Consider the 

from each of two populations 

variance, and those in the first ponulatinn ho, cb P°P u l atl °n is scaled to have unit 

= 1, 2 , . . . , p, and am taken as nos ^ ^ F**' The ° ther means are g iven b ? 

If ^ alone is used as discriminant show that ° Tut S * gn ob X} ‘ ^ necessary), 
either kind being equally important, is 6 P 1 lability °f misclassification, errors of 

1 f« 

(Cochran (1964 V(2^J ^ < “ ^ 2 ) * 

the variable should be tSrded as^ poor discrimbltor ^ Pr ° babiIity is lar S e > say 8 ^ <h 
^ dis’^ are independent, show that the best com- 

V 

2 &jXj/ < 5 ^. 

1 j 
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th e 


for 

first 


tVi° 
yari&t® 


• dependent variates x x , x 2 with values <5 1( <5 a (<5 X > <5 2 ) show that an observation on 
1 is equivalent t0 m nbservations on the second variate, where m = 8\/8 f. 


idep en 

discrinn 


the previous exercise, let the vaiiables x 2 have correlation p. By considering 
n t variables x 2 — Pi #i and show that if d 2 = fS lt 0 < 1, the correlation improves 

^ _tirnn 1 A V\n if +V\n __• 1 1 • 1 1 .1 


tl* "Ration 
the 


L - V X / ” J ; V/ T V 

over what it would be if the variables were independent, provided that 

(f-py 


\-p‘ 


■>/* 


that a negative correlation always helps the discrimination but that a positive correla- 
« e "“Wul unless l»im+P)- 

tion lS (Cochran, 1964) 

. Continuing the previous exercise, suppose that the correlations between any pair 
^ are the same and equal to p. Show that if p is negative, a discriminator based on all p 
x i> ,^j es - s fetter than it would be if they were independent; but that if p is positive this i 


P 

is not 


so 


unless 


( v \ 2 v 

Z 8A - 2 

i -i 

1 m 


(#-l) S 

5 =1 


(Cochran, 1964) 


S 

#> 


44.10 Show that discrimination with two populations may be formally represented as a 
Least Squares regression analysis in which the dependent variable y can assume only two values, 
namely m = n 2 /(n l + n 2 ) for the members of the first population and (m — 1) for the n 2 members 
of the second. This yields the boundary in Exercise 44.2 without the assumption of multi¬ 
normality. 

44.11 Discuss the approach of Exercise 44.10 for more than two populations, and show 
why it breaks down. 


CHAPTER 45 

TIME-SERIES: GENERAL 

„ mP110 n which is moving through time generate an 
45.1 Observations on a phenom lueg assurne d by a variable at time t mav 

—- - TT'ZZZ - ■- -« 

Table 45.1 Annual (he ^cult ural Statics) _ 


Year 


1884 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 


Yield per 
acre (cwt) 


15- 2 

16- 9 
15-3 

14- 9 

15- 7 

15- 1 

16- 7 

16-3 
165 
13-3 
16-5 
15-0 
15-9 
15-5 


Year 

Yield per 
acre (cwt) 

Year 

Yield per 
acre (cwt) 

Year 

Yield pe r 
acre (cwt) 

1898 

99 

1900 

01 

16-9 

16-4 

14-9 

14-5 

1912 

13 

14 

15 

14- 2 

15- 8 

15-7 

14-1 

1926 

27 

28 

29 

160 

16- 4 

17- 2 
17-8 

U1 

02 

16-6 

16 

14-8 

30 

14-4 

\j tj 

03 

15-1 

17 

14-4 

31 

15-0 

u J 

04 

14-6 

18 

15-6 

32 

160 

05 

160 

19 

13-9 

33 

16-8 

06 

16-8 

20 

14-7 

34 

16-9 

07 

16-8 

21 

14-3 

35 

16-6 

08 

15-5 

22 

14-0 

36 

16-2 

09 

17-3 

23 

14-5 

37 

14-0 

10 

15-5 

I 24 

15-4 

38 

181 

11 

15-5 

25 

15-3 

39 

17-5 
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or may not embody an element of random variation, but 

which we shall be concerned some such element is pies , observed value 

tion. We may regard a set of values m(£i), m (7 2 ), • • ■ » u ( . , , , s a 

multivariate complex. Their characteristic feature however s ^« the order of , he 

set h, t a . t n is material and not, for example, accidental as would be for a random 

sample i, ..., *„ in which the suffixes are adjoined for convenience of identification. 


Table 45.3—Average number of eggs per laying hen in the U.S.A. for each 

month of the years 1938-194U 

(Data from Report of the Bureau of Agricultural Economics, U.S. Dept, 
of Agriculture, on the Poultry and Egg Situation, March 1941) 


Year 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1938 

7-9 

9-9 

15-4 

17-5 

17-3 

14*9 

13*6 

11*8 

9*4 

7-5 

5*9 

64 

1939 

80 

9-7 

14-9 

17-0 

17-0 

14-6 

13-2 

11-7 

9-3 

7-4 

60 

6*8 

1940 

7-2 

90 

14-4 

16*5 

17-0 

14-8 

13-4 

118 

9*7 

7*9 

6-2 

6-8 


EO 
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c 

cD 


75 

«) 

■oo 
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% 
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----*- i 
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7538 


1939 
Date 


1940 


Fig. 45.3-Graph of the data of Table 45.3 (egg production) 


45.2 Although the variable t will always be spoken of and thought of as a time 

s P p“;™7ir " ab T V eVel ° P h3S ° bvi ° US a PP lications * Nation! 

its length / orTe vair V T‘ ion in thickness of a cotton thread alon 
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45.3 A further feature which distinguishes the set u(t) from the values of a multi, 
variate complex is that t is continuous, and we may therefore have to consider an in. 
finity of values of u(t). It is customary and convenient (though not, perhaps, very 
exact) to speak of a continuous time-series when we mean that t is continuous, not 

Table 45.5—Population of England and Wales at ten-yearly intervals from 1811 to 1961 

(Data from the Registrar-General’s Statistical Review) 


Year 

Population 

(millions) 

Year 

Population 

(millions) 

1811 

1016 

1891 

29-00 

21 

12-00 

1901 

32-53 

31 

13-90 

11 

36-07 

41 

15-91 

21 

37-89 

51 

17-93 

31 

39-95 

61 

2007 

41 

— 

71 

22-71 

51 

43-76 

81 

25-97 

61 

46-07 



Fig. 45.5 Graph of the data of Table 45.5 (population of England and Wales) 


necessarily implying that u(t) is continuous for any given t in the variables under dis 

cussion. Likewise, by a discontinuous series we mean one given at a (discontinuous 

o points t lf t%, ... y t n) although u itself may be a continuous variable such as 
length or a weight. 

<; „. F “ r , tl ’ e ) gr ,“ te " part of ° ur treatment we shall be concerned with discontinuou 
rir f T dl , Cate , apphCatl0nS t0 the “ntinuous case where necessary. In faci 
tlkint 1 ™ Stly deal w ‘ th senes which are defined at equidistant points of time; and 
taking the tune-mterval as unit, we may denote the values by u„, u u „ 2) etc. If w 
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require to discuss values for tim<- 

denote them by u_- « -points prior to the starting point u 0 , we may similarly 


so on. 


Some examples of time-series 

nr'rnr in nr<-li F . doubtless familiar with many examples of time-series such as 

f t . 1 , C ' t e sales curve of a commodity over a period of years, the records 

j fu re ,°y. ar ° metr ^ c P ressure at a locality, drawn by a stylus on a rotating 
, popu a ion o a country at a series of census dates, and so forth. We proceed 
o give a ew speci c examples which will indicate the kind of domain to be covered 
En ac ;*1 n / U ,f enCal exem P^fi cat i°n of the theory to be developed later. 

P , a j C j\rr i ustrate ^ * n Fig- 45.1) gives the annual yields per acre of barley in 
ng an an a es from 1884 to 1939. Table 45.2 (Fig. 45.2) gives the annual rainfall 
m London for each year from 1813 to 1912. Table 45.3 (Fig. 45 .3) gives the average 

e^/r? n ^ Cr ^ en * n the U.S.A. for each month of the years 1938 to 1940. 

a e . ( lg. 45.4) gives the sheep population of England and Wales as at June 4th 
0 ® ac y ear from 1867 to 1939. Table 45.5 (Fig. 45.5) shows the human population 
of England and Wales at 10-yearly intervals from 1811 to 1961. 


45.5 These series are fairly typical of the kind of material with which our theory 
has to deal. The data of Table 45.1 (barley) present a very irregular fluctuation but, 
so far as the eye can see, there is no systematic element and no tendency towards increase 
or decrease over the period given. Table 45.2 has some indications of oscillatory 
movements of a more regular kind. Table 45.3 provides an oscillatory effect which is 
definitely seasonal. Table 45.4 (sheep population) combines a general decline in 
numbers with marked oscillatory effects. Table 45.5 (human population) shows a 
regular growth without apparent fluctuation. 


Types of discontinuity 

45.6 The tables also illustrate various types of discontinuity to which observed 

series are subject: 

(a) In the barley series we have a case of essential discontinuity. There is one and 
only one yield per acre for each year. The actual time of harvest may vary from 
year to year but, roughly speaking, the intervals between successive observations 
are equal. 

(b) In the population series we have a discontinuity of observation, due to the fact 
that a census is taken only every ten years. The variable, however, exists all 
through the period covered and could be observed (theoretically) at any point of 
time. The same is true of the sheep series. 

(c) In the rainfall data we have a discontinuity due to aggregation. The “ rainfall ” 
does not exist at a single point of time; it is the summation over a finite time-interval 
which is of interest. That interval, of course, is at choice. We may observe by 
year, by month, by day or even by hour. Intervals may overlap, as when we 
compile in successive weeks the rainfall for the previous month. Such data are 
nevertheless discontinuous time-series in our sense. 
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_ j pjJ ona barograph, we cannot tabulate it 
(d) When we have a continuous r ^ take rea dings where we like, but 

after the manner of Tables 45.1-45.5. W ^ ^ by ^ com _ 

not everywhere. In conseque , ^ however, analyse them by methods 

putation except as an approxim > ^ some more elaborate device 

involving graphical integration, e.g. by a plammeter or ice. 

45.7 In practice the time-points at which we observe the series are often determined 
for us especially in economics. In experimental situations we may be ab e to dec.de 
them ourselves before the data are collected, or afterwards if a full record has been kept 
The question what is the best interval of observation is to be decided in the light of 
the circumstances of the individual case, and is not one on which we can enter at this 
point. (Very little theoretical work has, in fact, been done on it.) We may note here, 
however, that observation at fixed equal intervals, convenient as it may be, can suppress 
evidence of oscillatory movements which have a period equal to those intervals or some 
sub-multiple of them. The annual observation of the sheep population, for example, 
will take no account of seasonal variation within the year due to slaughtering or breeding; 
the annual rainfall figures conceal the fact that rainfall is seasonal to some extent, even 
in London. 

Calendar trouble 

45.8 Whether time-intervals are equal or not, it is obviously desirable that observa¬ 
tions should be comparable inter se. For series which are based on days or months 
there are certain nuisance-effects, due to the nature of the calendar, which have to be 
removed to ensure comparability. Some of these difficulties we can lay at the door of 
Nature, for not arranging that the year shall contain an integral number of days; but 
most of them are attributable to the man-made calendar. Months, for example, are 
not the same length; public holidays affect the comparability of economic and social 
data; exchanges and markets close over the week-end; and so on. Experimentally 
generated series are usually free from such difficulties if due care is taken, but they 
can arise in industrial series both in the large (e.g. stoppages due to strikes) or in the 
small (e.g. meal-breaks). We shall suppose that our data have been corrected for 
such effects so as to bring them on to a comparable basis. 

The problems of time-series analysis 

45.9 The ultimate object of analysis of a time-series—as of statistical analysis as a 
whole—is to arrive at a deeper understanding of the causal mechanisms which generated 
it, either out of sheer curiosity or because we wish to extrapolate into the future. It 
does not follow, however, that such understanding can be achieved by considering one 
series alone; for the series may be only a single facet of a complex phenomenon gener- 

Ch7Jr tT nUm , diffCrent S6rieS - We Sha11 revert t0 this question in 

ambition to 7^ 7 T * SS ™. Itlvariate s y stems - For the present we curb our 

Oft 7Z\l JZ7 T7 7 COnfimn S ourselves to the study of the type of behaviour 

suchSirZ^ ^ Sett K g 7 m ° delS Whkh Can S enerate hi recognizing that 

shall see later that 777 ]°^, ^ ° n y P ort i° ns °f a more basic structural system. We 
snail see later that no logical inconsistency need be produced by this approach. 
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45.10 A suivey of the practical examples we have given and of others known to 
the reader suggests that the typical time-series may be composed of four parts: 

(a) a trend, or long-term movement; 

(b) oscillations about the trend, of greater or less regularity; 

(c) a seasonal effect; 

(d) a “ random,” “ unsystematic ” or “ irregular ” component. 

As a matter of mathematical description, we can always represent a series as one 
of these constituents or the sum of several of them. A large part of the traditional 
theory of time-series, in fact, is devoted to an analysis of the data into such components, 
so as to isolate them for separate study. We must, however, attempt to avoid a trap 
here. It does not follow that if we can represent a series as a sum of such components, 
they correspond to independently operating causal systems. The decomposition of a 
series is very often useful, but it may be misleading and in any case is not the ultimate 
object of statistical analysis. 

45.11 Perhaps the easiest component to understand and to remove from the series 
is the seasonal effect. This is a fluctuation imposed on the series by a cyclic phenomenon 
external to the main body of causal influences at work upon it. The oscillation in 
egg-production in Table 45.3, for instance, reflects the rhythm in the reproductive 
process which is found among birds in virtue, ultimately, of the fact that the earth goes 
round the sun once a year. We shall confine the word “ seasonal ” to those effects 
which are annual in period; but the same ideas can be applied to any phenomenon 
generated by strictly periodic natural processes, such as “ spring ” and “ neap ” varia¬ 
tion in tides or daily variation in temperature. We must, however, be careful about 
extending the notion of seasonality to phenomena which are not demonstrated beyond 
reasonable doubt to depend on strictly periodic stimuli. For instance, it would be 
going too far, in the present state of our knowledge, to speak of sunspot variation as 
seasonal in this sense, and much too far to speak of seasonality in crop-yields as deter¬ 
mined by sunspots, even if the relation between the two were established. We shall 
return to this point below when defining what we mean by a “ cycle ” as distinct from 
an “ oscillation.” 

45.12 The concept of trend is more difficult to define. Generally, one thinks of 
it as a smooth broad motion of the system over a long term of years, but “ long ” in 
this connexion is a relative term, and what is long for one purpose may be short for 
another. For example, if we were examining rainfall records over a hundred years, a 
slow rise from the beginning of the period to the end would be regarded as a trend; 
but if we possessed records for two thousand years (and the rings in some of the giant 
redwood trees give an index of climatic conditions for periods of this order) the rise 
over a particular century might appear as part of a slow oscillatory movement, so that 
any inference from the “ trend ” in a particular century to the effect that the weather 
was likely to continue becoming wetter and wetter might be quite false. What inference 
we should make in practice would depend on what we were trying to do. If we were 
engineers designing a water-supply system and wished to provide against droughts of 
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reasonable extent, we might perhaps assume that the trend would last as long as our 
works and proceed accordingly; but if we were attempting to stu y cimatic changes 
over the face of the earth for geological periods of time we should accept the continuance 
of the trend with the greatest reserve or, more probably, should reject it on collateral 
grounds. 

45.13 However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course 
when the series has terminated (as might, for instance, be the case if we were con¬ 
sidering the lengths of reigns of the Roman Emperors). In speaking of a trend, there¬ 
fore, we must bear in mind the length of the series to which our statement refers. 
Perhaps it would be more accurate to speak of slow or quick movements rather than of 
trend and oscillation, but even so the distinction between the two would remain a matter 
of subjective judgement to some extent. 

45.14 When seasonal variation and trend have been removed from the data we are 
left with a series which will present, in general, fluctuations of a more or less regular 
kind. Fig. 45.1 represents the kind of series we obtain, since it has no components 
of trend or seasonality. The question then arises, is this residual series systematic in 
the sense that its values can be represented as a function of the time? Or, on the 
other hand, are the values random in the sense that they could occur, in the observed 
order, by random sampling from a homogeneous population? Or again, is there some 
possibility intermediate between complete functional variation and complete random¬ 
ness? The search for systematic effects in residual fluctuation gives rise to several 
techniques of analysis, the object of which is to detect whether any part of the series is 
subject to law, and therefore predictable, and whether any part is purely haphazard. 
The former part we shall call systematic, and it will be referred to as an “ oscillation ” 
(not a “ cycle,” which is a very special case of an oscillation, as we shall see later). 
The remainder of the series we shall call the unsystematic component, and refer to its 
movements as random or stochastic.” When a series is a mixture of oscillation 
and random movement it will not cause any inconvenience to refer to the up-and-down 
movement generally as fluctuation before we have analysed it into its constituents; 
that is to say, we may speak of fluctuation without prejudice to the possibility of detect¬ 
ing oscillatory movements in it. 


Tests of randomness 

45.15 Some of the series with which we are concerned are clearly not random. It 
would be a waste of tune to test the data of Tables 45.3 and 45.5 for the presence of 
some systematic effects. In some cases, however, it is not obvious whether systematiza- 
uon is present as for example in Table 45.1 (barley yields) and Table 45 2 frainfall) 

^Sv Pe e t7an ‘T.T ° f 

by chance in that order by sampling independent^ 1. *’ r ey f 

of unknown characteristics? P ? pendently on n occasions from a population 
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45.16 There is no limit to the number of tests which can be set u,p for this purpose. 

In choosing the most suitable we must have regard to a number of criteria: 

(a) If possible, the test should be distribution-free. 

(b) Since we may wish to test fairly long series, the calculations should be kept to a 
minimum. 

(c) Although we may not be able to specify an alternative hypothesis with precision, 
we may have some idea of its nature and can select a test which is likely to have high 
power against the alternative. For example, if we suspect trend we may find it 
useful to employ a different test from one used to test against periodicity. 

We proceed to consider some tests satisfying these criteria. 

Turning points 

45.17 One of the easiest tests to apply is to count the number of peaks or troughs 
in the series. A “ peak ” is a value which is greater than the two neighbouring values. 

If there are two or more equal values which are greater than their predecessor and 
successor (a rare event in general) we shall regard them as defining one peak. Likewise 
a “ trough ” is a value which is lower than its two neighbours. Our first question is: 
What is the distribution of peaks in a random series? (The distribution of troughs is 
evidently the same with a change of sign of the variate.) 

In point of fact, we shall find it more convenient to treat both peaks and. troughs 
as cases of “ turning points ” of the series. The number of turning points is clearly 
one less than the number of runs up and down in the series. The interval between 
two turning points is called a “ phase.” 

45.18 Three consecutive observations are required to define a turning point, say 
Mi, m 2 , m 3 . If the series is random these three values could have occurred in any order, 
namely in six ways. In only four of these ways would there be a turning point (when 
the greatest or least value is in the middle). Hence the probability of a turning point 
in a set of three values is f 

Consider now a set of values u lt u n , and let us define a marker variable 


X t by 

Xi = 1, Ui< u i+l >u i+ 2 


or Ui>u i+l <u i+ z 

= 0 otherwise; i = 1, 2, —2. 

(45.1) 

The number of turning points p is then simply 


n— 2 

p = 2 X t . 

1=1 

(45.2) 

We have at once 

E(p) = S£(X„) = |(#-2). 

(45.3) 


( re -2 \ 2 

? 7 

= E{ 2 Xf + 2 2 X { X ;+1 4- 2 EX f X i+2 + S X i 

X „_ 2 re —3 »-* ( re — 4 )( n — 5 ) 

k * 0, 1, 2, 


(45.4) 


1 
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where the suffixes to the 2 signs indicate the number of terms over which summation 
takes place. As a check note that 

(n-2) 2 = («— 2 ) + 2(w-3)4-2(m- 4) + («—4)(ra—5). 

We then have 

E(p s ) = (n-2)EX?+2(n-3)E(X i X i+1 ) 

+ 2 (n - 4 )E(X t X i+2 ) + (n -4 )(n- 5)E(X { X i+k ). (45.5) 

Since X\ = X we have 

E(Xf) = |. ( 45 - 6 ) 

For k>2, Xi and X k are independent, for they have no value of u in common. Thus 

E(X i X i+k ) = E(X i )E(X i+k ) = f. (45.7) 

It remains to evaluate E(X { X i+1 ) and E(X i X i+2 ). For the first, consider four con¬ 
secutive terms which, in ascending order of magnitude, may be denoted by the 
numbers 1, 2, 3, 4. The only non-vanishing contribution to XiX i+1 which can arise 
from a permutation of these numbers arises when there is a turning point in the second 
and third places. If the reader will write down the 24 possible permutations he will 
find that only ten make a non-vanishing contribution, namely 

1324 2143 3142 4132 

1423 2314 3241 4231 

2413 3412 


Thus 


10 5 

E(X,X W ) = g = H 


(45.8) 


For XiX i+2 we have to write down the 120 permutations of the integers 1 to 5 and 
count up those with turning points at both the second and fourth three places. There 
are, in fact, 54 and thus 


E{X t X i+ 2 ) 12Q 2Q . 

Substituting in (45.5) we find, on reduction, 

ey a2\ 40« 2 -144w + 131 

W =-go-■ 

Hence, using (45.3), we have 

16«-29 
™p = - 95 — 

Higher moments can be obtained in a similar manner. We find 


(45.9) 


Kt( p) m zW*+}) 

3{F) 945 


*4 (P) = 


— 1408w + 3317 
18900 


(45.10) 


(45.11) 

(45.12) 


Thus, in standard measure K 3 (p) is approximately 0-2«~" and * 4 (p) is approximately 
1-2 n~ x y indicating a fairly rapid tendency to normality as n increases. 
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45.19 Now consider the distribution of phase lengths. To define a phase of 

av a run nnl — j , o . . . . 0 _ . _ * 


I 

I 


i 


I < 


I i 




. . .-—* icngtns. io uennc a pnaac - 

length </ (say, a run up) we require rf+3 terms, involving a fall from first to second, ~ 
rise from second to third third to fourth, (rf+ l)th to (d+ 2)th, and a fall from (rf+2)th 
to (^+3)th. Consider the d+ 3 values arranged in increasing order of magnitude. If 
we pick out two other than the first and the last and transfer one to the beginning and 
one to the end, we obtain a rising phase of length d. There are \{d+V)d ways of 
picking out the pair, and each may go to either end, so there are (d+l)cZ rising phases. 
But in addition we may put the first member at the end and any of the others except 
the second at the beginning, giving us d+\ further cases; or the last member at the 
beginning and any except the penultimate at the end, giving (d + 1) further cases; and 
from this total we must subtract the case where the first is last and the last first, which 
has been counted twice. Thus there are 

(rf+iy+(d+l) + (rf+l)-l = d 2 + 3d+l 
rising phases. The probability of a phase, either rising or falling, is then 

2{d 2 +Zd+Y) (45.13) 

(«Z+3)1 * 

Now in a series of length n there are n — d—2 possible phases of length d. The 
expected number of phases of length d in the set of n values is then 

at = 2(n-<2-2)(<i 2 + 3d+l) . (45.14) 

d (d+3)! 

:pected total number of phases N, from (45.14), is given by 

N = 2 E* ( n -^- 2 H i2 + 3*2+1) 

«j=i (d+3)! 

(n — d—2)(d 2 + 3d+\) = -(d+3)(d+2)(d+l) + (n+\)(d + 3)(d+2) 

-(2» + l)(<f+3) + (» + l) 


The ext 


Now 


and hence 




11 

\ I 


n—3 f l 

N = 2 S| -3T + (TO)! 


n +1 2n +1 


- > (++ 


(d+ 2)1 


w + 1 \ 

‘( 5 + 3)11 


(45.15) 


f 


For all practical purposes we may neglect the second factor in (45.15) and hence 

N = 1(271-7). (45.16) 

Since the number of phases is one less than the number of turning points except in 
a* 9 «««! nut of n\ where both are zero, (45.15) agrees with (45.3). Now 

(45.17 


oince me numucr ui pjiaaw ^ * - - 

the 2 cases out of n\ where both are zero, (45.15) agrees 

,, /M— (>(n-d-2)(d 1 +id+\) 


We may 


derive the moments of this ratio fairly easily. For example, 

6 


/Ml 


n ~ 3 d(n—d— 2) (d 2 + 3/2+1) 
2^-7 i ” (<2+3)1 
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6 n ~ 3 f 1 n + 1 3n+2 5m + 3 _ 3(»+l)) 

2n — 7 i \ ( d-l)[ + dl (d+\)\ (d+2)\ {d+ 3)!J‘ 

Remembering the rapid convergence of S 1/x ! to e, we may to a very close approxima- 

• • , - 0 
tion write this as 




2 n 


^_|_ e+ („ + l)(,_l)-(3„ +2 )(e-2)+(5«+3)^-0-3(n + l)^-^| 

. *-T A \ r% 

(45.18) 


_ 3(m + 7-4<?) 3 

2 « —7 ‘ 2 ‘ 

Likewise we find that 
3 

Mz = 


(2 m_ 7)2 {(8g—21 )m 2 +(4<?— 17)m —(48i 2 — 14Qg+14)} = 0-560. (45.19) 


45.20 The distribution of which these are the moments does not tend to normality 
as m increases (cf. Exercise 45.1). A natural procedure in testing for randomness is 
to compare the observed distribution with the expected distribution given by (45.14). 
For shorter series, however, there is a theoretical difficulty in that the lengths of phase 
are not independent, so that a straightforward % 2 goodness-of-fit test is not valid. The 
question was examined by Wallis and Moore (1941) who came to the conclusion that 
for a three-fold classification d = 1, 2, >3 (two degrees of freedom) the X 2 statistic^) 
can be tested in the ordinary form with v = 2\ for X 2 ^6-3. For lower values ^X z 
can be tested in that form with v = 2. 

Wolfowitz (1944) and Levene (1952) showed that the number of phases tends to 
normality and Gleissberg (1945) tabulated the distribution of this number for n<25. 

Example 45.1 

Consider the barley yields of Table 45.1. There are 56 values in this series but 
at two points (1906, 1907 and 1910, 1911) the values in successive years are equal. 
So far as concerns turning points and phases we shall count each of these as one point 
and reduce the number of terms to 54. 

If the reader will mark the peaks and troughs on the table, or count them on Fig. 45.1 
he wm find that there are 35 turning points. The expected number, from (45.3), is 

5(5 nl = ThlS 18 80 cl0Se t0 observation that no further test is necessary 

The distribution of phases will be found to be 


Phase 

length 

No. of phases 
observed 

No. of phases, 
theoretical 
(45.14), (45.16) 

1 

23 

21-25 

2 

7 

917 

3 

4 

2-59 

Total 

34 

33-67 


Again a test is hardly necessary. 


(#) See footnote to page 421, Vol. 2 
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The conclusion would be, on these tests, that the variation in yield from year to 
year was random. 

45.21 Considered as a test against trend the turning-points test has a poor per¬ 
formance, and we shall see later (Exercise 45.4) that it has zero efficiency ^compare 
with other tests in certain cases. This is intuitively reasonable, for “turning is a loca 
property and would not be much affected by whereabouts along a line of gentle tren 
development the series had arrived. Considered as a test against cyclicality the test is 
obviously better. In a random series the mean interval between turning points is 
about T5 with a variance (from (45.10) and (10.14)) of about 9/(10n). The test itse 
is enough to enjoin further investigation in series of more than 10 terms whenever t e 
mean interval between turning points is 2 or more. 

The power of tests against specific alternatives for runs up and down has been 
investigated by Levene (1952). 

The difference-sign test 

45.22 A somewhat more laborious test consists of counting the number of positive 
first differences of the series, that is to say, the number of points where the series 
increases. (As before, we shall ignore points where there is neither increase nor de¬ 
crease.) With a series of n terms we have n— 1 differences. Let us, as before, define 
a variable 

Xi- 1, Ui+i> u i 

= 0 u i+1 <Ui ; i = 1, 2,..., (n-1). (45.20) 

Then the number of points of increase, say c, is given by 

71 — 1 

c= EX*. 

1=1 

For a random series we have immediately 

■E(r) = (.'-WQ = «»-!)• (45.21) 

Likewise 

E(c*) = E{ S Xl+2 Z X,X i+i + S XtX,}, j * i, i+ 1, 

71—1 71—2 (n—2)(?i—3) 

= (n-l )EXS+2 {n - 2)E(X i X, +1 ) + (» - 2) (» - 3 )E(X i X,-). 

= -|(» -1)+ 2(n - 2)E{X i X i + i) + £(« - 2) (n - 3). (45.22) 

To evaluate E(X i X i+l ) we consider permutations of three. Only in one case out of six 
does this give a non-vanishing contribution. Hence, from (45.22) and (45.21) we find 
var c = \{n -1) + j(?i -2) + -}(n-2) (n -3)- \{n- 1) 2 

= ^( M + 1 )- (45.23) 

The distribution tends fairly rapidly to normality (cf. Exercise 45.3). It has been 
tabulated by Moore and Wallis (1943). 

45.23 This test is clearly useless against an alternative of symmetrical oscillation 
where the number of movements up will approximate to the number of movements 
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m . , mainly as a test against trend, and especially aga 

down. It has been advocated X. ^ the turning-points test but very inferic 
linear trend. As such which we consider below, 

other tests based on rank ® rd trend and a random residual 

Consider, in fact, a series with a tin* 

u t = <x.+pt+e ( , ' (45.24) 

i • uiei «n‘th rero mean and unit variance. We can regard tk; 
where *, is a norma^ vanabl case ^ % takeg the equidistant h 3 

a regression of ^ sit P uation we should estimate (1 by 

* “ ( 45 '25) 

which is unbiassed and has variance 

1 12 ^ 12 

Var ^ ~ 2(£-?) 2 n ( n 2 - l ) ' » 3 ’ ( 45 . 26 ) 

We now use the asymptotic relative efficiency (cf. 25.5-6) to compare other con¬ 
sistent tests with that based on b. Since b is unbiassed, \dE(b)/dff\p = Q = 1, and (25.16) 
becomes, with m = 1, 

[E\b )] p=0 /(var 6)" - (ra 3 /12)*. (45,27) 

Thus, for the statistic b, d defined by (25.16) takes the value 


We now compute <5 for the difference-sign test statistic. 
Consider the “ marker ” variable 

Ha = 1 , Ui>Uj 

= 0 u { < Uj 

with H ■ = 1 


(45.28) 


(45.29) 


Lb 

The expectation of is the probability that equals unity, and since u —u- is 
a normal variable with mean 0(i-j) and variance 2, this is equal to * ' 

l« V^25 ex P[-«*-« ! '-d')} 2 ]* 

1 f“ 


v«J, exp (-^ 2 )^- 


= a, 1 = *-/ 

/?=o s/2 \/{2n) 2 s/ti‘ 


(45.30) 


- &,■ «»> 
.0 but for ,ater purposes we proceed 

““ * Hkl are lnde P™dent, f, j, k, l unequal, we have 

\ u _ _ “I r- 


P d 1 ’ 1 unequal, we have 

W | + f E(H kl ) * £(/f,)l 

r J P=° L cp J^=( 




(45.31) 
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Consider now fly fl^. Since y i —y, and y, — y k are jointly normally distributed with 
correlation —| we have 

= Prob (fly = l,fl. /c = 1) 

= I f —^ exp {—^(x^—xy+y^dxdy 

J -P(i-i)/V 2 J -P(]- k )/V2 2fly/x F \ 3 V 


and 


-P(i-j)/V 2 j —|3(,-*)/V2 2 n VI 

^ + ^l”a eXp 


'J 


Similarly 


t-Jfe 

4V^' 


K JW wL- t ^r- 


(45.32) 


(45.33) 


4\/7r 

Finally we require the similar expression when y i+2 -y {+ x and y i+ \—yi are of opposite 
signs. This is the probability that (yi +2 — J^+OCVi+i - yi) is negative and is seen to be 
r-P/V 2 r°o i (■ 2 ^ 

£= L 3 r-v+sn** 

whence we find 




’dEi 


< +5 - 34) 

Reverting to the difference-sign test, we have that the number of positive increments 
in the series—cf. (45.20)—is 

n—1 


Thus from (45.30), 


c ~ ^ Hi, i+ 1* 

1 = 1 


t 


dE(c) 


n —1 1 

= s 


«—l 


c>P J/j=o i=i 2-0s 20r 

Remembering that the variance is, from (45.23), (« + l)/12, we have 


- (S)‘ 


(45.35) 


(45.36) 


Thus, for the difference-sign test, <5 defined at (25.16), Vol. 2, takes the value 

K = b (45.37) 

and comparison of (45.37) and (45.28) shows that c has zero asymptotic relative 
efficiency, by (25.24). 

Mann (1945a) gave a lower bound to the power of the test. Stuart (1952) has 
tabulated the power of the test against the normal regression alternative at the 95 per 
cent level. 


Rank correlation tests 

45.24 There is a prior presumption that we shall improve our test still further 
if we compare, not merely neighbouring pairs as in the difference-sign test, but all 
pairs. Given a set of values u 1} u 2 , ... , u ni in that order, let us count the number 




358 


THE ADVANCED THEORY OF STATISTICS 


V k * 

of pairs in which > u { , j>i. If this is P, we note that there are \n(n — p a j rs 
that the expected number in a random series is \n(n — 1). The excess of P over this n ^ 
ber indicates a tendency to positive trend, a deficiency corresponding to a negative t ^ 
In fact, this quantity is a simple linear function of the rank correlation coefficient^) 
defined at (31.23), Vol. 2, between the order of the variables in time and their order 
magnitude u. For a random series the variance of r is known. If Q is the com / Q 
mentary quantity to P, namely the number of values for which Uj < u if j > We j^ e ' 

- 1 4 Q 

T n(n—\y (45.38) 

and from (31.33-4), 

(45.39) 

(45.40) 


E(t) = 0 

varr = ^" + ^. 

9 n(n— 1) 


The distribution of r tends rapidly to normality—cf. 31.26. 

45.25 In the notation of 45.23 we may write 

n 

Q = S H sj . 

In the rase of an alternative linear trend (45.24) with normal residuals, we then have 
from (45.30), 


l h*w. 


P—0 


2 Vn.f/' 7) 


1 


Also, from (45.40), 


so that 


VSifr (W+l)-/} 

1 n(n 2 —l) 

2 -\/n' 6 


var Q 


n* 
36’ 


(var0)! ~ {«V(4ar))t. 


(45.41) 

Substituting (45.41) and (45.27) in (25.27) we have for the ARE of t (or 0 ) relative to 
the regression estimator, v 

A —I [ J ^(0]/9=o/(var t)*) _JL 

\[£»W(var 4)i) <3/2 ’ = ^ 0-98, (45.42) 

2 giVe " m 3 }f- The stat,stlc ^ » therefore very efficient in this case 

,V, M ,nTfh 945b consl . dered the T ' test for var * a bles Kj such that P{u ( >u.) = £+% for 

h/’t r r n ylng n Certa u °* her Conditions ’ ftd (cf. Exercise 31.8) gave conditions for 
the test to be unbiassed, and an example in which it is the most powerful test. 

the time WC * har the statistic r, but as this would cause confusion here with 

for sZpfe vahles temp0ranI !' from convention not to employ Greek letters 
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45.26 We can also calculate the Spearman rank correlation tg (31.22, Vol. 2) in 
this case. It may be written (cf. (31.40)) 

r -1 12V 
r s — 1 — 


n{n 2 — 1) 

where 

r = i 

i<j 

As at (31.22) we have, for a random series, 

E(r s ) = 0 

1 


so that 


We then find 


va r% =—, 

E{V) = S (j-i)E(H„) 

i<j 

var V - —. 

144 


[ 


h E(V) 


l 


_l/?=o 




(45.43) 

(45.44) 


(45.45) 

(45.46) 


2 (« 2 — 1) 

2\y/n ’ 

(45.47) 

\ = 0-98. 

(45.48) 


and exactly as at (45.42) we find 

The Spearman coefficient has, then, the same ARE as t. 

45.27 Both r and r s are more troublesome to calculate than the difference-sign 
or the turning-point statistics; and in practice r s is easier to calculate than r, using its 
form (31.21). 


Example 45.2 

Not to overburden ourselves with arithmetic, let us take the first twenty-five terms 
in Table 45.2 (years 1813-1837). In order of magnitude the values of u t are 


Rank 

Rank 

Difference 2 

Rank 

Rank 

Difference 2 

Rank 

Rank 

Difference 2 

1 

9 

64 

10 

11 

1 

19 

21 

4 

2 

18 

256 

11 

13 

4 

20 

2 

324 

'i 

4 

1 

12 

25 

169 

21 

15 

36 

4 

23 

361 

13 

8 

25 

| 22 

3 

361 

5 

10 

25 

14 

5 

81 

23 

14 

81 

6 

12 

36 

15 

7 

64 

24 

20 

16 

u 

7 

19 

144 

16 

22 

36 

25 

1 

576 

8 

6 

4 

17 

17 

0 



2898 

9 

24 

225 

18 

16 

4 

i 
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v = K 2898 ) = 1449 

(12)(1449) = _ o-l 14. 
r s = 1 T5600 

- ->»■'“ « ® s > - " -»» 

perfect agreement. 

45.28 We may also mention, without entering into very great detail, two other 
tests which have some interest: 

fa) The records test. An observation u, is called an upper (lower) record rf it exceeds 
(a) I smaUer than) all previous observations in the series. The number of records 
inpearing as we go along the series provides a test statistic which can be compared 
whh the distribution from a random series. The subject has been explored by 
Foster and Stuart (1954), some of whose results are presented in Exercises 45.8-9. 
It appears that, as a test against trend, the records test is more powerful than the 
difference-sign test or the turning-points test, but is considerably less powerful 

than the x or r s tests (Stuart, 1956, 1957). 

(b) The rank serial correlation test . This is a special case of a type of statistic, 
the serial correlation coefficient, which we shall introduce in 45.32 below. If 
the ranks of a set of n quantities measured about the mean \{n+\) are d it i = 1, 
2,.. ., n, the coefficient of order k is defined by 


r !c = 


1 n—Jc 

—S d<d, 
_ n — R i=i 


i a i +Jc 


1 


(45.49) 


# , - 1) 


So far as a test statistic is concerned we may use simply 

W k = 2 d i d i+k . (45.50) 

1 = 1 

The coefficient W k is the covariance (multiplied by n — k) of the terms in the rank- 
series distance k units apart. For a random series its expected value is zero. As 
a test of trend the coefficient has zero efficiency against the normal linear regression alter¬ 
native (cf. Noether (1950), Stuart (1954, 1956)), although it was suggested as a test 
against trend by Wald and Wolfowitz (1943). 


45.29 If a series is random we shall clearly be able to test it equally well by ignoring 
certain terms, e.g. by taking every other term or every twelfth term. To look at this 
from another viewpoint, if our series is only recorded at periodic intervals instead of 
in toto, our tests remain valid. We lose information, of course, but not validity in 
the tests. The same is true of aggregative series; for example, if we have the (annual) 
a gg re g ate °f twelve monthly records, each of which may be regarded as a member of 
a random series, the annual figures are also random. On the other hand, randomness 

in an annual series does not rule out the possibility of seasonal movements in the 
constituent series. 
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irregular AuauationThicMoSke Sot’ 1 ' 1 '/ 11 ' 1 " ™ ic f osco P e > P resent a " 
Kptwppn onrr*oeoi u - S lliCe a kmc * 0 ^ ran domness in the limit as the interval 

t - , VC ° servatlons becomes smaller. On the other hand, the series are 

anrlrm-i ' amV ^ at ^ e . < l uest ^ on whether it is possible to have a continuous 

\. n • series *. J* ou . r vie _ w > ^ is not. There is, to our mind, something essentially 

is n nuous m e 1 ea of independence of successive observations; continuity would 

es roy in epen ence. We can imagine a set of points, each determining the value 

a . r f I J .° m var J- a e > becoming ever closer together, and the variance of the variable 

mums ng so t at the total range of variation remains within finite bounds. But it 

oes no appear possible to proceed to the limit in the way that the mathematician 

proceeds from an enumerable to a dense set of points on a line. 


45.31 For most, if not all, practical series we can imagine them as continuous to 
t ie eye ut discontinuous under the microscope. Pressure may be thought of as a 
continuous variable but on a sufficiently small time-scale is discontinuous, being the 
result of impacts of individual molecules of gas. The profile of the cotton fibre may 
be similarly imagined as continuous, although ultimately composed of discontinuous 
particles. In this sense we may, perhaps, speak of a continuous random series, bttt 
it is a form of expression which we shall have to watch very carefully. To test such 
a series for randomness we may take observations at any suitable interval and carry out 
on the resultant one of the tests we have discussed earlier in the chapter. A discussion 
of more refined methods of approach must await our account of correlogram and 
spectral analyses. 


Serial correlations 


45.32 For series which are not random there will be dependencies of one kind or 
another between successive terms. One very useful measure of this effect is the product- 
moment correlation between successive observations. Given n values u 1} u z , ... , u w 
the so-called serial correlation of lag 1 is defined by 


*i = 


1 «-!// 1 »-l 

- T 2 i\ U i - T 2 U i' lu 

ft-1 i =i IV ft-1 £=i 


■i+l' 


1 n—1 

2 u,- 


n — 1 i= 


i=i 


•i+l 


1 n —1 

2 


1 


n—1 

2 Ui 


1 


(ft—i) i=i 7 ft-i 


2 r“ »-i - 


1 n-1 

£ft i+1 
1 


v ( 45 - 51 ) 


_M— 1 i=i 

Likewise, the serial correlation of lag k is the correlation between pairs of terms 
k units apart, viz. 

1 n vV 1 "v* \ ( 1 V \ 

n-kui Ui ){ Ui * k n-kBi U, * k ) 


ft—t=i ( ft-/? i=i 


1 n—h 

i- VI 


n — h 
2 ft; 


n-k \i=i 


i +lc " 


n 


1 n—h \ 2 


1- (+5.52) 


In practice (and also for theoretical convenience) it makes for simplicity to modify 
these definitions to some extent. Instead of measuring the first (n-k) ft’s about their 
mean, we may measure about the mean of the whole set of observations; and similarly 
for the values at the end. Similarly, instead of taking separate variances in the 



of (45-52) we 

i »,/«, * ^ PUt 
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denoroi" 2 ' 0 ' 
for 


DVn ’use the varia " Ce 
may uS 


„ statistics 

■ ° F , , series. Thus, writings 

^APVANC^^ 0 f the Wholes 


h 88 



(45.53) 


- - « - , length the difference 

* for series of m° d “ 3) for short series 
, shall mostly use- fu | not to I of r greater than 

,hc form w f. s S We must be » part iculai, vai 

llu ?«52) is neghgihie- v ^ary. 1° V 

from (45-52 ■ fa estinia tlon is » 

^r-e. tells us a good deal about the 

f coefficients (-ft '» mtality is called the correlogram, 

45.33 Thearraycf e ® of the senes. T1 agam st k as abscissa. 

nature of the internal depen ^ ^ graph of: r t ^ within samp ling limits. 

;,T—- « — * “•* tl "*“ 

We ,ff ,rbt y no.ed K ^ definition, r_* - V 

• • and for computational convenience m 

45.34 For certain theora !“‘ ^be’modified still further. For a coefficient 

a minor way, the definition ( ■ ) J h numerator. Suppose we put 

of order* there are only n-k terms 

We may then sum the product-moment in the numerator over . terms to obtatn 

71 t ~\ 


r - t=1 
r lc - — 


Z (Ut—ii) (u i+ f c -u) 


(45.55) 

Z (Mf-m) 2 

i = l 

We are obviously here distorting the data by assuming (45.54). But if k is small 
compared to n we are not distorting it very much. The coefficient r k of (45.55) is 
called a circular serial correlation. The point of introducing it will become evident 
in 47.30 and Chapter 48. 

45.35 In concluding this chapter we may, for the avoidance of confusion refer 

not of the kind we 

breaks of epidemics, etc., may happen fom^eToT “ t f affic ^ 0Ut ' 

over a period constitute a series of events. The inte 'T ^ * Certaln area and thus 

irregular but may nevertheless have a distribute ( ^ VS S ^ etween them are, usually, 

Z St 1t d V he theot y of o s“ S -h patterns of behaviour 

utedtl?? haPPCningS Selves, S are S f * e lntervals between happenings, 

and Observed atT„°^ t ," ne ' series . which concerns a " d thlS topic is reall y 

ed «t specified intervals. ™ S a com P>« moving through time 


i 




i 


) . 1 

* 


j 

; 




.h 






I 
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EXERCISES 

45.1 In the distribution of phases of 45,19 show that the moment-generating function of 
Na/N for large n is given by 

3(e“° - 2e~*° + e^°) (g -1 - e°) + f - 
where g — eX p ^0) 

Hence verify that ji[ = 1*5, k 2 = 0*560, and show that k 3 = 0*677, = 0*904. The dis¬ 

tribution is thus not normal. 

45.2 For the distribution of positive first differences (from a random series) of 45.22 show 
that odd order cumulants vanish, and that 

/< 4 = — (« + l)/120. 


45.3 In the nl permutations of n numbers show that if P n is the number with S positive 
differences, 

Pn{S) = {S+\)Pn- 1 (S) + {n-S)Pn- l {S-l). 

Hence obtain the recurrence expression for the moments 

E»(** <+1 ) = 0 

E n (**) = — En-ii.X + iyi-^En-iix + iy ** 1 

n n 


where 

Hence by induction show that 


x = S-E{S). 


lim 

>00 


Pi iW 


{f* 2 (x)y 

and thus that the distribution tends to normality. 


- i= (2i —l)(2t —3)... 3.1 


(Mann, 1945a) 


45.4 Show that, against the normal regression alternative, the turning-point test based 
on p defined at (45.2) has 

{£'(/>)}/3=o = 0, {E"(p)}p= o^O, 

and <5 defined at (25.16) = so that its ARE, compared to the regression coefficient test, is 
zero; and that it is also zero compared to the difference-sign test. 

(Stuart, 1954, 1956) 


45.5 Observing that, in the notation of 45.23, a rank n may be expressed as 


ri = S Hi), 

i=i 

show that, for the normal regression alternative, 

9 




Hence, for the statistic W of (45.50), 


l T^ W) l 

l_9/3 _l/?=o 


= 0, 


”9* 1 n 5 


and 


var W ~ n 5 /144, 


and thus that W has ^ = | and zero ARE compared to the regression esffmator^^^ ^ 




c^rATlSTICS 

trORY OF 

<CED T HbU from a S et of continuous distrib u 


If values ui, t 
with frequency 
the series >h> Ui ’ 


A nv A WCED 1 _ from a set of continuous distrib u 

THE „ are chosen■»' ^°,ed number c of pom, s of incre*. 

)’ show that t e 

?- by , ,rM“* du,du,+ “ 

j ft+i C«f+i) j __ ^ 


For the 


- 0 elsewheie, 


lim i S(o) - i{l+«2-' e ' )} ’ 
*-**" «<-l, 




as 0, »<-*» 

-1, *>*’ 

_ • e 45 6 show that for the normal distribution 

57 As in Exercise 45.o, snow 

/■ .i = —-—r exp > 

Jim * 2?(c) = V2), 


(Levene, 1952) 


where 


._!_r »-» 

\Z( 2 ti)J —oo 


^ " V(2^) 

and the trend is given by «» = (f — 1 ) 0 - 




(Levene, 1952, who tabulates the values) 


45.8 For a random series #i, x 2 , ■ . •, x n define 

u r = 1 if the rth observation is an upper record, 

= 0 otherwise; 

lr = 1 if the rth observation is a lower record, 

= 0 otherwise; 

= Ur + lr, d r = Ur~l r . 


Define also 


s = 2 S r 
r =2 


d = Z d r . 

Th Th«n r iT“\ at ‘ he Second ob “™fion. 

M 18 fte JOmt t d in . seri e s tf r obsemtioni 

P ''' (t ' ^ = (‘' ~'r) rf- 1 ) +1 p( r -il d 

* (I, (0,0) = i u-l, d+l) 

and hence derive th*» 

tty-generating function g( 0 1( 0 ) as 
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Hence derive, for the characteristic functions of s and d, 

^W(t) = l W n (r + 2e {t ) 
n - 1 = 0 

<£d <n) (*) = -7 n (r + 2 cos t). 

n ■ r=0 

Derive also the joint c.f. and by inversion show that 

p<*Ks,d) = ?,<">(,) 2 - ( i( / +(i) ) 

where p x is the frequency function of s given by 

£i (n) (s) = —7 2 s m ( " - 2 ) (n-s-1) 
n\ 

and u (n) (r) denotes the sum of products of all selections of r integers out of 1 , 2 , , 

(Foster and Stuart, 1954) 


45.9 


In the foregoing exercise show that 

E(d ) = 0, var d = 2 2 

»'=2 r 


n i 

E(s) = 22-, var s 

r=2 r 


Ml Ml 

2 2 —4 2 - 2 . 

2 r r=2 r* 

(Foster and Stuart, 1954) 


45.10 Show that ru of (45.55) cannot exceed unity, but that n of (45.53) may do so 

45.11 For the coefficient of (45.53) calculated from a random series show that 

* - -h 

and hence that r* is biassed as an estimator of serial correlation in the parent series. 


CHAPTER 46 

TIME-SERIES: TREND AND SEASONALITY 

De r i, r :^ 1 part of the concept of trend that the movement over fairly l„ n 
periods is smooth. This means that we can represent the rend component, at least 
locally, by a polynomial in the time element t. Thus, given the series u t> We may , 
in the* first instance, seek for some polynomial 

U( = #0 + # 12+^2 * * • ~^ a pt V ( 46 . 1 ) 

which will give an account of the trend movement. By taking p great enough we can 
of course, obtain as close a representation as we like to a finite series; and how large 
we take p is a matter for decision in particular cases. 

We need not restrict ourselves to polynomials, although they are the most con¬ 
venient mathematically. Any suitable function of the time can be taken, though 
we should naturally choose one which itself moved in a trend-like way. Growth 
curves, for example, may be represented by exponential functions, and population 
curves like that of Table 45.5 are sometimes represented by the logistic curve of type 

“W = f+ps- (46.2) 


1 + e- 


46.2 If a polynomial is fitted to the whole series by least squares, it evidently 
gives the curvilinear regression line of u t on the variable t. It is, however clear that 
to obtain a satisfactory trend-curve for data such as that of Table 45.4 (sheep population^ 
we should have to take a polynomial of rather high order or a somewhat complicated 
more general function. This may appear somewhat artificial and in any case the 

nntmhf“f ° f Tif 3 pol ^ nonual ’ bein S based on high-order moments, would be very 
unstable from the sampling viewpoint. A more practical objection, though bv no 

means an unimportant one, is that if we add another term to th series as fofexaLTe 

tZ d:ne e S“me Seri M UP ‘° “V™ ^ ^ the 

its length Whet thereW rh 'T’ trend -' ine ma y be throughout 

to use the simpler methods’ detcXl bdo^ 17 ° bV1 ° US ^ “ iS m ° re convenient 

Moving averages 

is to determine a polynomial 1 whfch ^![ n ° miaI whlch wiI1 represent the whole series 

polynomials for different parts The «; * e P resen * a P art °f and to use different 

of the majority of methods of trend-fit* ^ metko< b an< ^ one which forms the basis 

at will), fit a polynomial of degree -h not^f ? t0 , take the first n terms ( n being chosen 

normal to determine the value in the middT % * han to tliem , and use that poly- 

m the middle of its range; then to repeat the operation 
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with the n terms from the second to the (n+l)th, and so on, moving on one term at 
each stage. Unless other considerations require it, we take n to be odd, so that the 
middle point of the range corresponds in time to a value which is actually observed. 
Otherwise the middle point falls half-way between two observed values, or we have 
to use some value of the fitted polynomial other than the middle point, which resu ts 
in a loss of useful symmetry. 


46.4 Suppose, then, that the number of terms is chosen to be odd and is enote 
by 2m +1. Without loss of generality we may denote the terms by «-m> * • ’ ’ 

«o> • • • > u m-x> u m- If we choose to fit to them a polynomial of degree p we may, in t e 
usual way, determine the coefficients by least squares, i.e. solve the equations 

A s («,-«,-«!(- ... -a^vy, j = o, i,- f, t 46 - 3 ) 

OClj t=-m 

which will give us equations typified by 

( 45 - 4 

The sums £ V are functions of m only. Thus, if we solve (46.4) for a„ we sha 
find an equation of the form 

a 0 = C 0 +C 1 14_ m +C 2 M_( wi _i)+ . • • + c 2m+l u m f 

where the c’s depend on m and p , but not on the ms. 

Now «. assumes the value a. at t - 0 and hence this value, as given by (46-5), 

is the value we require for the polynomial. As we see, this is equivalen g 

average of the observed values, the weights being independent of which part of he 
series^ is taken. Thus our process of fitting a trend-line consists of de emuning the 
constants r (which depend on m and p and therefore give us a ° 

choice) and then calculating, for each consecutive set of (2m+1) terms in the series, 

a value riven by (46.5). If the terms are . .the calculated value will 

correspond to t = mL A supplementary procedure is necessary to give values 
corresponding to the m terms at the beginning and the m terms at the end. 


^Tuppofe we have a series and wish to fit a curve which best approximates to sets 
of seven points; and suppose we regard a cubic as providing a satisfactory approxima¬ 
tion What are the weights of the moving average. 

We have m = 3 and p = 3, and our polynomial is 

Ui — Cl^t (tit** 1 ^ ' 

Taking our origin at t = 0, we find, for equations (46.4), in virtue of the fact that 
2^ = 0 for odd k , 


(46.6) 


E u = 

7« 0 

"i“ 28a 2 


Etu = 

28 a x 

+196a 3 

E thi = 

28a 0 

+ 196a 2 


E t 2 u = 

196% 

+ 1588a ; 
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Cl a 


_ 1- (7SM-S t 2 u} 

= V {_2«- 3 +3K- 2 + 6 “-i + 7 “» + 6 “ 1 + 3 “ s ' 2 “ s} ' 

we may write this conveniently ■ » ^ ^ 3> _ 2] 

when p, ”“ k ’ 

*• 1 ” ; l ; “* 

,.1 2 3 4 5 6 7 8 ) ™ 

u \ o 1 8 27 64 125 216 343 512 729 

We have, for the trend-value at t = 4, v91 * n 

«„ = J T {(- 2 x 0 ) + (3xl) + (6x*)4- • • • -(2x216)} 

The trend-valur^equl To the actual value of the series, and this obviously must 
be so when we note that we are fitting a cubic to the series 

«,= (<- 1 ) ! - 

It will be observed that in this example we should have obtained the same value 
for I if we fitted quadratics instead of cubics, for a, does not depend on a, in equations 
(46.6 ; and generally the case p odd includes the case of the next lowest (even) value 
of p, so that we need not give separate formulae for even p. 


46.5 Writing a 0 [A] for the value of a 0 calculated in the above manner f°r an 
average of k successive terms, we find the following formulae up to p = 5. The 
reader may care to verify them for himself as an exercise. It will be evident that the 
sum of coefficients in any formula is unity ; for if we apply the trend to a set of values 
all equal to unity, the result must be unity. 


Quadratic and Cubic 

[5] aVE - 3, 12, 17] 

[7] ttE-2. 3, 6, 7] 

[9] *[-21, 14. 39, 54, 59] 

[11] tM-36, 9, 44, 69, 84, 89] 

[13] *[-11, 0. 9, 16, 21, 24, 25] 

[15] -nVrt-78, -13, 42, 87, 122, 147, 162, 167] 

[17] xtj[—21, -6, 7, 18, 27, 34, 39, 42, 43] 

[19] -ajVr[—136, -51, 24, 89, 144, 189, 224, 249, 264, 269] 
[21] tit, [-171, -76, 9, 84, 149, 204, 249, 284, 309, 324, 329] 


(46.7) 
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Quartic and Quintic 

17] 2£t[5, -30, 75, 131] ~ \ 

[ 9 ] T¥o[15, -55, 30, 135, 179] 

t 11 ] 4i- 9 -[18, -45, -10, 60, 120, 143] 

t 13 J 2T3T [HO, -198, -135, 110, 390, 600, 677] 

[ 15 J 4 6T8-9 [2145, -2860, -2937, -165, 3755, 7500, 10125, 11063] • 

[ 17 J 4rW[195, -195, -260, -117, 135, 415, 660, 825, 883] 

f 19 ] rm [340, -255, -420, -290, 18, 405, 790, 1110, 1320, 1393] 

[21] h oVnr [11628, -6460, -13005, -11220, -3940, 6378, 

17655, 28190, 36660, 42120, 44003]j 


(46.8) 


46.6 It is sometimes more convenient to express these formulae in terms of the 
differences of the series A r u t where 

A u t = u l+ i xi (46.9) 

Thus, for example, 

2t[~2, 3, 6, 7, 6, 3, -2] = u t - *4-(9A 4 +9A« + 2A>,_3 (46.10) 

which exhibits at once the fact that our fit is exact for a cubic, i.e. as far as fourth 
and higher differences. Or we may equally well represent the process as a moving 
average of the differences, which provides a convenient method of calculation when 
differences are smaller than lower-order differences or the original values of the series. 
For instance 


Ytl-2, 3, 6, 7] - tt < +2 1 T {2A 3 w < _ 3 + 3A 2 w,_ 2 -3A 3 w,_ 1 -2A 3 wJ (46.11) 

= u t +i T [ 2, 3, -3, 2]A»tt,_. (46.12). 

= u i~ 2 T P, 5, 2]A* M ,_3. (46.13) 

We can obviously represent such formulae in a variety of different ways. (46.13) is 
particularly convenient because it gives us the residuals immediately. 


Example 46.2 

Suppose we wish to represent one of the other formulae in (46.7) in this manner, 
say the quintic fitted to 11 points: 

T i F [18, -45, -10, 60, 120, 143]. 

We first of all subtract unity from the middle term to give 

[18, -45, -10, 60, 120, -286]. (46.14) 

The sum of coefficients must now be zero. Denote by U a shift operator such that 

Uu t = u l+1 . (46.15) 

Then A = 17—1. (46.16) 

The moving average (46.14) may, apart from the divisor 429, be written 
18-45 U —10 U 2 + 60 U 3 +120 D 4 -286 C/ 5 +120 ?7 6 +60 D 7 -10 D 8 -45 t/ 9 + 18£/ 10 . 
We know that this is exact as far as fifth differences and consequently A G = (U- 1) 6 

must be a factor. We find 

([/_ 1)6(18 D 4 +63 D 3 + 98 C/ 2 + 63 U+ 18). 



The ofig 1IA 


' J ,AA 8) in terms of differences :(*) 

*1+**"--’ for (46-7) and( 
are the f° rIIlU aC . d Cu bic \ 


(46.19) 


r 46 , 

S s&nr; 70,j 

1 <«* 

ri31 «7'Ti?t Zi ’ „ on 1435 2100, ^ /0 J 

[15] ,-nWf 227, 434, 686, 924,“^O, 10230]A‘« 8 

H^sssK-nsssJ 

Quartic and Quintic 

[7] n+sfrP]^ 

[9] M^+irst 3 - 

[11] u 6+1 h[l8, 63, 98 ]A°u 3 

IV u +-W110, 462, 987, 1302]A*w 4 

15 «.’+Xpi45, 10010, 24948, 42273, 51198]A«„ S > (46.19) 

[17] tt 9 +4rWf 195 > 975 > 2665 > 5148 > 7623, 8778]A® Mg 
[19] « 10 +tiW[ 3 40, 1785, 5190, 10875, 18018, 24453 , 27258]A«w 7 
[21] %+W<mr[11628, 63308, 192423, 426258, 759003, 

1135134, 1450449, 1581294] A 6 k s; 

46.8 Several methods have been proposed to simplify the arithmetic of fitting 
a trend-line by moving averages, the large numbers in some of the expressions in ( 46 . 7 ) 
and (46.8) involving considerable labour in straightforward application. The simnUct- 
perhaps, is that of iterated averages. ^ ’ 

Suppose we take an average of sets of four with eoual weiahtc—, „ • , 

process-and then another average of the same kind of that average If tl^ ^ ° 
SeneS ‘ S tk ° f the to 'p'n** will be to give ?££ 

and that of the second operation to *give" + + ^ 

W 1 = Ifa + v t + V 3 + Vi ) 

Ve write this sy m b = ofaS + a f ! + 3 “ a+4 “- + 3 “* + 2 “. + «t). ( 46 . 20 ) 

--ftp* h 1, 1]}2 = 2> 3 
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or, resex ving the symbol — [&] for a simple arithmetic mean of terms, as 

XT , tV[4] 2 = X V[1, 2, 3, 4]. (46.22) 

Now compare the weights of the average derived in Example 46.1 for fitting a 
cubic to seven points. Reduced to unit divisors, we have for the weights of the latter 

-0 0952, 0-1429, 0-2857, 0-3333 
and for the weights of (46.20) 

0-0625, 0-1250, 0-1875, 0-2500. 

The two are not identical, but they follow the same sort of course and it might be 
possible to regard the latter as an approximation to the former. (We shall derive 
better approximations presently, but this will serve for purposes of illustration.) Now 
the iterated summation resulting in (46.20) is much easier to carry out than the single 
weighted averaging process of Example 46.1. Generally, if we can find averages 
with simple integral weights, preferably unity, which will, in conjunction, give approxi¬ 
mations to the more complicated weights of a single average, it is usually easier to use 
the iteration process. 


46.9 In the notation of finite differences, write 

du t = u t+ % — u t _ i. 

We have, for the second “ central ” difference d 2 u t , 

d 2 u t = (u l+1 -u t ) -{u t -u t ^ 

= (U-2+U- 1 )u l . 

Writing 

U = exp (2 i<j>), 

we find, symbolically, 

d 2 = exp (2ij>) -2 + exp (-2 ij>) 


Then 


= — 4 sin 2 (f>. 


Ill VI 

2 Uj = 2 U^u 0 

j=—m j——m 


= ^1+2 S cos2/</>W 0 , 


since the terms in sin 2j<f> vanish, 

_ sin (2m + \)<f> 
sin <f) 

Thus 

1 ril 1 sin k6 

— \k\UQ — j- — -j- U 0 

k k sin <p 


u n . 


- sin 2 , + 3 *>sin**- 


= U 0 + 


2 2 3! 


2 4 5! 


‘M 0 + • • . 


(46.23) 

(46.24) 

(46.25) 

(46.26) 


(46.27) 



(46.28) 



372 


THE advanced theory of statist.cs 

the arithmetic average in terms of the middle ten,} 


This interesting formula gives 

«. and its central differ ““* imatdy represented by a cubic, so that fourth different 
If now our senes is approximately rep ** 

vanish, we have, taking «. as the middle te , 

' ■*-2T S ' U ” («-29) 

up to third differences. Similarly, f 0r tWQ 


Uk]u 0 = u 0 

/v 


and this equation will in any case be true 
iterated averages we have, to the sam 

— [K] [k 2 ]u 0 = u 0 + T V (*? + k l~ 2 ) (52 u °> 

ki k% 


(46.30) 

and so on. We will use'these results to derive two formulae in very general use by 
actuaries for “ graduating ” a series, a process which is very similar to that of fitting a 

trend-line. 

Example 46.3 Spencer's 15-point formula 

Consider three successive averages with equal weights 

to ft][4] [5J m o = u 0 + f T (4 2 + 4 2 + 5 2 -3)d 2 u 0 
= u 0 + fd 2 Mo- 

Multiplying by 1-f S 2 , we then have, to third differences, 

«o= sV[4] 2 [5](l-^> 0 . 

Substituting for d 2 the formula [1, — 2, 1], as given by (46.24), we find 

«0= 3io[4J 2 [5J[-9, 22, -9]. 

Now without affecting the order of the approximation we may add factors in <5 4 or 
higher central differences, and can simplify the numerical coefficients to some extent 
Let us add to the factor [-9, 22, -91 a term -3d 4 = f-3, 12 -18 1? _n’ 
The result is [-3, 3, 4, 3, -3], giving ’ >’ 3 J* 

M o = 3jo[4] 2 [5][—3, 3, 4, 3, —3]. (46.31) 

This is Spencer's 15-point formula. It covers sets of 15 consecutive terms, the 
weights in full being ’ 

3To[-3, -6, -5, 3, 21, 46, 67, 74J. 


Example 46.4 Spencer’s 21-point formula 
In a similar way we find 

. . , ,. , ttj[5] 2 [7] = 1+4S 2 

giving, to third differences, 

We now add ‘o the factor °[-4,‘ 9 /-4] VheIpiessbn 

-3d -jd« = [-3, 12, -18, 12 , -31 + r-l 3 71 in 71 a n 

giving J^L 2 > Jy -/?, 10, 71 3, -1] 

“» = t h[5]‘[7][-l 0 , i, I] 

This is Spencer’s 21-point formula" ^ °’ ^ 


(46.32) 


(46.33) 
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46.10 Simplicity of calculation, however, is not nowadays as aS 

was, and for certain purposes these approximations are to be avoide . C j°j o ree 
formulae (46.7) and (46.8) provide lines of closest fit for assigned extent and aeg 
of polynomial. It follows that for given m and p the sum of squares o weig 
coefficients is a minimum. In fact, if we apply the moving average to a se ^® s 
sisting of a polynomial trend of degree/) plus a random residual e t (whic as 
distribution for all t) the residual sum of squares is given by 

2 (c 0 + C 1 S 3 - +1 + • ■ • + c 2m £ i+2m) 2 > 

so that the expected variance of residuals is 

Scf (46.34) 

which is thus a minimum within the class of weights reproducing the same degree of 
polynomial for assigned m. 

End-effects 

46.11 The moving-average method as we have expounded it has obvious properties 
of symmetry. It also has the drawback of failing to provide trend-va ues or 
first m and the last m terms of the series. As a rule it is not a great loss to ave 
forgo the values at the beginning, but the absence of trend-values at the end is a 
serious handicap, especially when we want to extrapolate into the future. o c 
fill the gap in various ways, recognizing that trend-values at the end may not e so 
reliable as those in the middle. The method illustrated in the following example 
is probably as simple as any. 


Example 4S.5 

Consider again the formula used in Example 46.1, 

2t[—2) 3, 6, 7]. 

We obtained this by fitting a cubic, but used that cubic only to determine the middle 
point of a set of seven. There is no reason why we should not also use it to determine 
the last three points of the end-set of seven. To do so, however, we need to solve 
equations (46.6) for « x , a 2 , a 3 as well as for a 0 . 

We find 

^ = ^{397 2^-49 2 ^}) 

a 2 = ^ M + ^ 

a 3 = 2 T 6 -{ — 7 2fM+2f 3 u}. 

From these result^ substituted in the polynomial we find, in an obvious notation 


‘ [-2, 3, 6, 7, 6. 3, -2] + ^[22. -67. -58. 0, 58, 67. -22] 

+^[5, 0, -3, -4, -3, 0, 5]+ir[-l, 1, 1, 0, -1, -1, 1]. 
Thus, for example, with t = 1, 2, 3,-the expressions reduce to 

«i = Uh -4. 2 > 12 > 19 > 16 > - 4 J’ 


(46.35) 

(46.36) 


THE AD 


F sTAT 

~ 10 81 (46.37) 

- 4 ' b ‘ lbl 4 g 39], («.3S) 

n terms were 0, » ^ ^ould be 

jKS^%^16^^ <+ ^ _8W1 = j25 

” * ,,,24 + 2375 + 1728] = 125 

The next is j.m-7-&+ 16i + 1 results, of course, are exact 

•r that the last is 2I6 ' 

j ^ reader can verify ^ a cub ic. (46.35) to (46.38), as m the 

because we *". ft “«^ en »n is to be bec ome more and more unequal 

One interesting P flts suin to unity. from «o to u z- The variance 

formula for u 0 , t e c g j n general, increases re f} eC tion of the fact that, as we 
» that ^'““lidual term therefore increases. ^ ^ polynomia]s become 

depart more from the centre of the range w ‘^ fficjents in this case are 

less “reliable.” ^ ^ 0*M6. 

<V 0-3333, «i- u monotonic. Some further results 

The increase in sum of squares ts not, 

ZZ™ of squares have been tabulated by Cowden (1962, 

for fa «<25. 

4612 Results of the foregoing kind may also very conveniently be obtained by 
thereof orthogonal polynomials (28.18, Vol. 2). If we pot * = 2m+ 1 we find, for 
the first four polynomials, 

fa ft) = 1 
fa ft) = t 

faft) = faft 2 ~i m ( m + 1 )} > (46.39) 

faft) = fa ft 3 -jt(3m 2 +3m-l)} 

fait) = l i {t i -\t 2 {6m 2 +6m-5)+^{m-\){m)(m + \){ni + 2)} ^ 

nnxir fnr pvomnl a 


now, for example, 
: have at once 


u i - b 0 +by fa + b 2 <j ) 2 + b 2 fay 


(46.40) 


u 0 = b 0 - \l 2 m{m + l)b 2 . (46.41) 

TeTJuJZ th We b h eS ofthe . f ™ cti ™ ‘ he values of 2 can be obtained and 
values We have, in virtue of the orthogonality property, 

h _ ^ u /^o Eu, 

" S(S„ 2 (46.42) 

5. = Sa <^a 

, Zfl ' (46.43) 

ample, with a cubic fitted to sets of 7 * = 1 a , 

’ 3, and we have, from 146 491 146 411 
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and the tables, 
Hence, from (46.40), 


as in Example 46.1. 
discussed in 46.11. 


b 0 — 7 2 u t , b z — -g 1 - 5 -S(t 2 4)u j. 
u 0 = i 2 M < - 2 1 T (St 2 « < - 4 SMj) 

A similar use of the polynomials will give us the end-values 


46.13 We have as yet said nothing about criteria by which we should decide the 
extent of a moving average, 2 m + 1 , or the degree of the polynomial, p, on which it 
should be based. There are, in fact, no simple criteria of this kind. One important 
reason for this is that a great deal depends on why we are interested in isolating the 
trend, or, to put the matter in a rather different way, what is the underlying model 
which determines our dissection of the series. If we are concerned chiefly to describe 
a broad trend in the data, and are not particularly interested in short-term and residual 
effects, one type of moving average may be adequate. But if we want to remove the 
trend in order to study the residuals, such a type may be quite inappropriate; and 
indeed, for some purposes, we may well question whether it is safe to eliminate trend 
by a moving average at all. Before, then, we can adequately discuss the choice.of 
a suitable method of finding the trend we must consider the effect of our methods 
on residual variation. 


The effect of trend-elimination by moving averages on other components 

46.14 In Table 46.1 we have applied the o^encer 21 -point formula to an artificial 
series obtained by adding a random element to a cubic. (We have chosen this formula 
rather than one of (46.7) because the effect of successive simple averages can also be 


seen.) Specifically, 

u t = (f-26) + -^(f-26) 2 + -dn r (f-26) 3 + £ t . (46.44) 

The component e t was taken from tables of random numbers and consists of samples 
from a population in which all integral values from 0 to 99 are equally frequent. The 
various columns of the table illustrate the process of fitting, and we may note in passing 
£qj- jj series as short as this it is convenient to leave the more difficult summations 

to the last as there are substantially fewer of them. 

Now we know that the Spencer formula will fit a cubic exactly, so that when we 
subtract the trend from the original series we ought to eliminate the systematic con¬ 
stituent entirely and be left with our random component, except in so far as we have 
rounded off the systematic element to the nearest unit. A comparison of columns 
(2) and (9) in Table 46.1, remembering that the latter includes an element 49-5 equal 
to the mean of the random component, shows that we do not do so. The reason is 
not far to seek. The moving average has acted on the random element itself and 
determined a “ trend-line in it. 

The results of applying the Spencer 21 -point formula to the random element e t 
are shown in column (11). We should expect that if the method were perfect the values 
in this column would be 49 * 5 , the mean of e„ apart from irregular sampling effects; 
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- 119 S —90 

-105 If —17 -246 

=S 48 I -32 | -209 


-87 -572 


-70 59 _ 42 -241 

-60 I r„ <9 162 


6 -60 1 

7 -51 83 

g -44 72 

9 -37 59 

10 -31 93 

11 I —26 76 

12 -22 24 

13 -18 97 

14 -15 8 

15 -12 86 

16 -10 95 

17 -8 23 

IS -7 3 

19 / —6 I 67 

20 - 5 44 

21 -4 5 

!2 / -3 54 

3 -2 55 

4 -2 50 

J -1 43 

i 0 10 


30 

6 

90 

96 

31 

9 

61 

70 

32 

12 

18 

30 

33 

15 

37 

52 

34 

20 

44 

64 

35 

24 

10 

34 


12 162 

I 05 413 2233 

ill 670 3801 

164 844 5120 

215 957 5984 

1 86 996 6642 

198 1078 7041 

233 1026 7145 

246 1071 , 7038 

163 1069 6934 

231 984 6709 

196 850 6535 

112 892 6408 

148 853 6363 

205 852 6446 

192 944 6611 

195 1024 6769 

204 1031 7052 

228 1015 7353 

212 1050 I 7610 

176 1136 7923 

230 1153 8249 

290 1201 8607 

245 1337 9019 

260 1357 9424 

312 1373 9870 


30 96 

36 22 

44 13 

52 43 

61 14 

71 87 
83 I 16 


230 

1153 

290 

1201 

245 

1337 

260 

1357 

312 

1373 

250 

1462 

306 

1541 

334 

1599 

339 

1760 

370 

1897 


44 | 

109 

45 

124 

46 

140 

47 

158 

48 

177 , 

49 

198 

50 I 

240 

51 1 

244 


159 692 
156 794 
180 935 
201 997 
239 Hu 
221 1180 
270 ! 


411 2047 14699 

443 2233 16060 

484 2452 17570 

525 2711 19353 

589 2960 21394 

670 3270 23690 

692 3680 26255 

794 4088 

935 4529 

997 5017 


14352 

15470 

15815 

15676 

14978 

14166 

13379 

12703 

12169 

12102 

12279 

12676 

13228 

13857 

14508 

15120 

15634 

16251 

17002 

17717 

18499 

19307 

20159 

21133 

22417 

23797 

25737 

27955 

30456 

33334 

36716 
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but not only do the observed values deviate from this mean, they do so systematic^y, 
the values having a small oscillatory movement which is shown as part o 

46.15 This effect is vital, particularly if we are eliminating trend so as to concen 
attention on oscillations. We proceed to examine it more closely. 

Suppose that we have a series composed of the sum of three parts, a trcn i > 
an oscillatory term x 2 (t), and a random element x z (t), so that 

u, = *,+*,+*,. . 1 

If we detei*mine the trend by a moving average, denoted by an operation , 

Cleafly Ti h = T Xl +Tx 2 +Tx 3 . (46 ‘^ 

Let us now suppose that our method of determining trend is perfect in the sense t at 
Txt = x v Then, on subtracting (46.46) from (46.45) to eliminate tren , we n 

u t - Tu t = x*- Tx 2 + * 3 - Tx 3 . (46.47) 

The point of present interest is that the terms Tx 2 and Tx z in (46.47) may distort 
the genuinely oscillatory parts of the residual series and induce spurious osci a ory 
movements. 


46.16 Consider the simple case when x 2 is a sine term, sin (ot + yU), t being integral. 
Since 

2 sin («.+Xt) = sin * ** s [ n { a +i(fc + l)A}, (46.48) 

{ =i sin ^X 

a simple moving average of k consecutive terms will result in a sine series of the same 
period and phase as the original, but with the amplitude reduced by the factoi 

1 sin \kX (46.49) 

k sin \X 

Iteration a times will reduce the amplitude by the gth power of this factor 

Thus the term Tx z will be small if k is large, q is large, or if \kl is a multiple o 
7 t that is, if the extent of the moving average is a period of the oscillation. But it A 
is' small and kX is small the amplitude is reduced very little and x 2 — Tx 2 will largely 
disappear, i.e. the moving average will partially obliterate the term in In this 
case kX being small, the extent of the moving average is small compared with the 
period of the harmonic term, that is to say the oscillation is a slow one. This result 
is what we should expect. A slow oscillation is treated as a trend by the moving average 
and eliminated accordingly. Generally, the moving average will emphasize the shorter 
oscillations at the expense of the longer ones. Furthermore, if the extent of the average 
is slightly greater than the period, the term (46.49) may have a negative sign, and 
consequently the difference from the trend may somewhat exaggerate the true oscilla- 

tl0n it is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonic, but evidently the same kind of situation 

is apt to arise. 
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* Ct of a simp* Suppose to be a random element 
378 .. r t he effect oi * , we wiU supF 

46.17 Now coWj ual eleine nt *«> * h we haV e 
weights) on the f For the term 3 ^ (46.50) 

with variance _ __ y £ e t+o 

e ‘ r * 3 A-B» Avceed 1^* Consecutive values of 

. t . ffer which does no x ^ 3 («) and Tar, (ft) have 

in . r m is the greateSt ?ecutive values of T* * [ia ^<k. Thus the series Tu z 
* ILoendent, but con are correlated f avera gings will become 


smoother stni. " — 

further examples e 0 g of a ran dom series will then be to 

1S T he effect of taking a ^weights are such as to give a positive 

-r ffbsa 3 

SSjTSfced in moving "^idans who (independently) studied it in 

the Slutzky-Yule effect, after the tw 

detail. . i ; n t h e cvclical sense, that is to say its peaks and 

The generated series is i»ueg ^ ^ and the amp ii tu des of the oscillations 

troughs do not recur at equal i s hail prove a theorem of Slutzky’s 

vary considerably («,curve). Nevertheless 
showmg that certain ki resemblance to the kind of movement which is 

tadTpSe^ulW in economic time-series, and we shall consider them in 
more detad later. For our present purposes we require to consider how far the process 
of trend-elimination itself may generate such effects, in order to be sure that oscillatory 
movements in a trend-free series have not been put there, so to speak, by our own 
arithmetical processes. 


46.19 For this purpose we shall consider the period and variance of a series gener¬ 
ated by the Slutzky-Yule effect. 

Since the peaks and troughs do not recur at equal intervals there is no quantity 
which we can conveniently call the length of the oscillation. There will, in fact, 
be a distribution of lengths. We may define as the mean length either the mean 
period from peak to peak, or that from trough to trough; but this raises some difficulties 
as to whether we are prepared to admit as periods small ripples on the main undulation. 

osduSTZh 1- ” bitra [ y CharaCter> We shal1 take as measure of 

distance between pointe where the^seri^Tdimges^r?‘ S -* 0 ^ the 
-crosses the u-am” Suppose the series is by zT™ *° °l 

Then the 1 probability that Ch * S distributed with variance v. 


% — S «■£.< 0 
i =i * 


(46.51) 
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% +1 = jta s e tn >0, ( 46 ' 52 > 

i.e. that the generated series changes sign from negative to positive, is the proportional 
frequency of 

dF = (2n)«*+« eXp {"4 1 4 d “' '' (46 ' S3) 

between the hyperplanes £ e } = 0 and £ e 3 -+i = 0. This is equal to the angle 
between these two planes, which is given by 

cos0 = ^ aj a j+1 /ia% (46.54) 

Hence the mean distance between upcrosses is 2 7i/0, where 0 is given by (46.54). 


46.20 In a similar way, the probability that 

%-«*-r>0. ^' 55) 

that is, that u k is a peak of the series, is the angle between the two hyperplanes 

£ /z. = 0 (46.57) 


and is given by 


£ ajS j+1 - £ ajSj = 0 
j=l 3 = 1 

k k 

£ dj £j — £ dj Ej-i = 0 

3=1 3 =1 


. (d 2 -dj)a ! + (a 3 -a 2 )(a 2 - a i) + . •. + a k (a k —a k ^Q 

cos Pl = -«5+( aa - fll )’+...+«l 


(46.58) 


(46.59) 


Thus the mean distance between peaks is 2 n/Q v The same formula obviously applies 
to mean distance between troughs. 

46.21 If we wish to exclude “ ripples ” of a certain length d from consideration, 
we may enquire for the probability that (46.57) and (46.58) are satisfied in conjunction 

with *>w (*•«» 

This is evidently the area cut off on the unit sphere by the three planes (46.57), (46.58) 

and i: ¥r s ¥w -o. (46.61) 

If the angles between the planes are A, B and C, this area is A + B + C-2n = 0 2y 
say. The mean length between peaks, ripples excepted, is then \n/0 2 . 


Example 46.6 

In Table 46.2 we show 480 terms of a series of random numbers which can take 
integral values from 0 to 19, together with a moving sum of fives of a moving sum of 
threes. Fig. 46.1 shows a portion of the derived series graphically. There are 474 

terms of the smoothed series. 
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, " lean value of our series is 15 x9-5 = 142-5. The number of upcrosses will 

e ou n rom t e table to be 23, the first between the 19th and 20th term of the smoothed 
series, m ast etween the 459th and the 460th. The mean distance between upcrosses 

? S „ en , ~ un its. How does this compare with the mean distance given 

by normal ” theory? 

The weights of the graduation are [1, 2, 3, 3, 3, 2, 1] and from (46.54) we have 

cos 6 = |4 = 0-9189 
0 = 23° 14'. 

Hence the mean distance = 360/23-233 = 15-5 units. The observed mean distance 


i- 

vt 

^ 100 



360 380 400 420 440 460 48 

Number of term t 

Fig. 46.1 — Graph of the last 117 terms of the series S of Table 46.2 


is 20-0 units, but this is based on rectangular variation, and we are, perhaps, entitled 
to expect some difference from normal theory. For rectangular random variables, 
values distant from the mean occur more frequently, and it is not surprising to find 
oscillations in the series which do not result in upcrosses. 

The number of peaks in the series will be found to be 62, the first at the seventh 
term, the last at the 466th. Hence the mean distance between peaks is 459/61 = 7*5 
units. From formula (46.59) we find 

cos 6i = §, 0i = 48° IT. 


Thus the theoretical mean distance is 360/48-187 = 7-5 units, in good agreement with 
experiment. It will be observed that several of the distances between peaks are due 
to very small ripples. 
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46.22 Let us noW ^ e or iginal random has var i a nce kv and its mean k 

^"random “le moving average has a variant Jj 

l he suu follow tnai * ^.-relations between successive memk 

rs^-^om«--.. <. - —rs 

the derived series. 

weights «!,•••> 2 tfy Sj = ?7i> sa y 

S a 3 - e j+1 = Vi> sa y > (46.62] 

S £j+n-1c Vn—lc+l' J 

i f +k* enm of these values is zero since the expected value of £ 
b/Sen tob/so. Since there are n-k + 1 terms we have for the variance 

1 - n / J ^ 


-Sj?*. 


;ero since the expected value of £ 
terms we have for the variance 

(46.63) 


The expected value of this, since the e’s are independent, is 

—L— E 2 vf = E^ 2 ) = ® 2 <if. (46.64) 

«—A + l <=i 7 

In particular, if the as are all equal to 1/A, the expected value of the variance is v/k. 
This gives us the average reduction in the variance. 

If a simple average of extent k is iterated q times the weights are the successive 
coefficients in 

i. a+x+x > + ... + ^y=^(^y. 

The sum of squares of these coefficients is the coefficient of in 

1 ( \-x k fi 

k 2q (1 -*) 2 ?’ (46.65) 

and this gives the average reduced variance for a simple average of k iterated a times 
The following are the values of the reducing factor for some of the values of 2Z t : 
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46.23 It is also instructive to consider the effect of the moving average on serial 
correlations in residuals. For the series (46.51) generated by a moving average on 
a random series we have, as at (46.54), 

cov (u„ u t+s ) = F{2 a } e j+( 2 a 3 - e j+s+t } 


Jc 6 

V 2 Clj Uj 
5=1 


(46.66) 


and thus for the sth serial correlation of the resultant series 

i 

a i a j+s 


r s = 


\s\<k 


(46.67) 


Jc “ s 
2 

= 

Jc 

2 a) 
i=i 

= 0, \s\>k. 

Thus, for an infinite series generated in this way we see that, whereas the original 
(random) series had zero serial correlations, the induced series is serially correlated 
up to order k, i.e. as long as terms in the generated series have any terms of the original 
series in common. 

For example with a simple moving average of extent k, all the a’s are equal to l/k, 
and from (46.67) we easily find 


- 1 _l*1 
“ 1 T’ 


(46.68) 


so that the correlation may be quite high for 5=1 and falls off linearly, as 5 increases, 
to zero at s = k. High correlations of this kind between neighbouring values are 
responsible for the Slutzky-Yule effect. 

Example 46.7 

The weights of the Spencer 21-point formula are 

sh[- 1, -3, -5, -5, -2, 6, 18, 33, 47, 57, 60]. 

Apart from the divisor 350, which may be disregarded for present purposes, the sum 
of squares of weights is 17,542. The products (46.66) and the corresponding serial 
correlations are as follows: 
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-0-030 

0-836 

13 

-214 

-0-012 

0-660 

14 

-27 

-0-002 

0-461 

15 

50 

0-003 

0-269 

16 

59 

0-003 

0-111 

17 

40 

0-002 

0-000 

18 

19 

0-001 

-0-061 

19 

6 

0-000 

-0-082 

20 

1 

0-000 

-0-074 

21 

0 

0-000 
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-Correlogram of series generated by the Spencer 21-point formula 
(Example 46.7) 


The variate-difference method 

4U4 The concept of a series which consists of a polynomial element ni 
residual of a more or Jess random kind has given rise to a method w u * pl S 
eliminate the former by differencing Clear! v 0 ,,™-. * a 'a w ^ lc h purports to 

entirely eliminate any element which is actually a poIyno^iaUn the Z™' 1 e . VentUall y 
rehed upon almost to eliminate any systematic Z7 the tUne > and may be 

or cyclical terms. Let us consider the effect of differenr 4 ^*’ P Cidia P s ’ exponential 
c <- We have ect ot differencing upon a random se 4 


l C ‘* (p^-t+Q^r-a- . . . +(-l)r, ( 

Tnldng, without loss tQ 

. a -r E(M r T mean - we have 

Var ( A W = t> i Q* 

* ® x coeff. j n n 

/2r\ 0+*)'(*+l)r 
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TIME-SERIES: TREND AND SEASONALITY 
We may then derive an estimate of v by writing 

_ 0 


( 46 . 72 ) 


i it is to be noticed that we use the second moment about zero, not the observe 
of A r e t , since the mean is known to be zero. This shortens the arit me ic 
k extent. 

r I’he factor ( 2r \ for r = 1 to 10 has the following values: 

W 


r 

( 2 ;) 

veo 

1 

2 

0-5 

2 

6 

0 - 166,667 

3 

20 

0-05 

4 

70 

0 - 014 , 285,7 

5 

252 

0 - 0 * 3 , 968,25 

6 

924 

0 - 0 2 1 , 082,25 

7 

3,432 

0 - 0 3 , 291,375 

8 

12,870 

0 - 0 * 77 , 700,1 

9 

48,620 

0 - 0 * 20 , 567,7 

10 

184,756 

0 0 6 5 , 412,54 


46.25 Basing itself on equation (46.72), the method of variate differences proceeds 
as follows. We difference the series once, find the second moment about zero of 
the resultant, and divide by 2; we then difference again and find the second momen 
about zero, dividing in this case by 6; and so on. If the successive estimates oiv decrease 
we continue with the differencing. There will, in general, come a point when they 
cease decreasing and remain constant within sampling limits (which may be rather 
wide). At this stage we may suppose that we have eliminated the systematic element 
in the original series. The final estimate gives us an estimate of the variance of the 
random element in the original series, and the order of the difference to which we have 
had to go will give an indication of the degree of the polynomial representing the 

systematic component. 


Example 46.8 

Let us apply the variate-difference technique to the series of Table 46.1. We 
know from the method of constructing the series that the systematic part ought to be 
completely eliminated after the third differencing, and also that the random part con- 
sists of an element with variance 833 approximately. In fact, the random numbers 
from 1 to N have a variance (N 2 —1)/12, and N in this case is 100. The actual variance 
of the random element in Table 46.1 is 843. 
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Table 46.3 shows the series and the differences up to A 6 . For the sums of squares 
in the various columns Sj corresponding to A-', we find: 

S x = 107,541 

5 2 = 318,115 

5 3 = 1,033,513 
= 3,445,308 

5 5 = 11,720,069 

5 6 = 40,548,844 

To obtain second moments we divide by 51 —j and then, to obtain the estimate of v. 


* ff> 


We find the following: 


j 

Estimate 

1 

1075-41 

2 

1082-02 

3 

1076-58 

4 

1047-21 

5 

1011-05 

6 

975-20 


Curiously enough, the estimate for j = 2 is higher than that for j = 1, and there 
is little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a 
variance of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated 
by the first differencing; and the method indicates, quite correctly, that the greater 
part of the systematic element is linear. The random element is rather large com¬ 
pared with the non-linear systematic terms, and the latter have got caught up in it— 
the series is too short for the variate-difference method to disentangle them. Con¬ 
sider, for instance, the cubic term (t —26) 3 /100. In the original series this varies 
in value from —156-25 to +156-25. First differences reduce it to 3(f — 26) 2 /100, 
varying from 18-75 through zero to 18-75, whereas the random element is increased 
in range from 0 to 198. Already the systematic term is being swamped by the random 
element, and a slight degree of accidental correlation between the two can easily account 
for the increase in the mean square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear equation 
plus a random residual. If we fitted a straight line by least squares and examined 
the residuals, we should probably find very little evidence of departure from random¬ 
ness. This representation would differ from the mode of construction of the series, 
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cated formulae being 


4 < 

* V- 1 


46.27 Write 




(46.73) 


Then we have 


/^2> 


(46.74) 


£(A r M) 2 _ ( S ^i)^ 2 

*re is -he « ^^u r .,- ■ ■ ■ + (-W 

+ {b 0 u r+z -b 1 u r+1 + b i u r - .. • +(-l) Mb} 

+ {^ 0 K B - M?i-l+Mti- 2 - • • * 4-( — 1 ) r K U n-fS J 
Consider first of all the terms in this which result in fourth powers of it. They 
will derive from 

E{bluf +1 + bluf+ ... + b 2 u\ 

+b 0 u ? +2 + blu? +1 + ... +b 2 r u-l 


(46.75) 1 


+ 


w .. +blul+b\u\_ i+ ... +b 2 u 2 _ r } 2 . 

Writing now 

= (*!) 2 +(*5+*!) 2 + ... +(6J+62 + ... 

A S = (bl+b ];+ .. .+b;y- = AV 

we find that the term in E (« 4 ) is ' 1 ' 

Sfonlyodtertermappear^^t^. 2 ^^- 


(46.76) 

(46.77) 

(46.78) 


will write 
in terms of 


(46.79) 

-c or lyp e m tQ) t # m If the reader 
n t at the coefficients are expressible 


and ^ H 

_ 

• + i r-,-- 1 6,_ 1 )a. (46.81) 


are expressible 

+i4 ... +i j\ 2 _ / 2r \2 

^ ~ (46.80) 

) ! +(h»4 i + 4 l 6 j+I+ . 


i 
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The expression for £'(A r «) 4 reduces to 

(n — 2r)A%E(u*) + 4{(n—2r+l)Al + (n — 2r + 2)Al+ • ■ • 

+ A*(n-2r + r)}E(u | i&) + £(« 4 ) 

+ 8{ J Bf + 5|+ . . . +B?}£( m z m 2). ^ • > 

( r 2dY\ ^ * 

Substituting ^ for E^u*) and jx\ for E(ufu%), dividing by (»— r ) 2 ^ y J an< * subtracting 

we fi n d the sampling variance of the estimate of v. The expression can, howev , 
be simplified to some extent. Putting 

-■•-SOVO'-SOW^SOV-)’*- 

we find, after lengthy algebraic rearrangement, 

var 4 f .fidaL J U 

( ”" r )(r) ”- r { (,,-r) (r) } 

+ ^{G:)/( 2 ;) 2 -^)}’ '<* (46 - 8 

If terms of order (« — r) -2 can be neglected, this reduces to 



(46.83) 




(46.84) 


j«4-3 /«! 


/4A 2^4 //2rV 
\2r)'n-r/ \r ) 


or, using the Stirling approximation to factorials, 

-{j“4 — 3^1 + /“! V(2 y7T )}> 


(46.85) 


(46.86) 


which is a fair approximation to (46.85), being within 3 per cent for r as low as 6. 

When the population of values of u is normal, 3^1 vanishes and the formu a 
simplifies accordingly. 


46.28 In a similar way it may be shown that 


T (2r\ ’ if/2r+2\ l = ^ ~ \ 1 (2r\(2r +2 

(»"<) 1 (r)Ul 


where 


r + 1 


(n — r— 1) 


’4r+l 

2r 


2w —2r—1 


n-r /2r\/2r + 2\' n-r-\ 2(»-r-l) 


^=s C)‘g:d % 2 s O' g: ;y- 


(46.87) 
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The arctic ^ 

to a certain extent. 


W 3 “ “ 

further modification* 

*«■** 469 fsheep population) an application o 

For the data of Table 45. (P P fol | ow i n g results: 

method up to the tenth di fference ga _ 


ence 


r 

V(r) ( " -r) 

i 

3468 

2 

1442 

3 

854 

4 

629 

5 

518 

6 

448 

7 

401 

8 

371 

9 

357 

10 

347 


< <r 


■ 

i 


The values here are falling steadily from r = 1 to r = 10 hut vp™ c i;„u<.i * 
the end. From (46.88) for r = 6 we have for the variant of The £ % 
approximately, and for r = 10, 25-8 approximately. It appears that thf T*’ ^ 
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and the quotient when this is divided by ( r ) ten8s 

2 J^r ~ vw 

t obtain an estimate of the 

and so increases without limit. In such a case we can 

variance of any random element which may be prese . ivalent problem 

The problem of testing differences between Sr an‘ J+ 1 ’ d b corre lations between 

of testing whether the ratio Sr/Sr+i is near unity, is (1940) and Johnson (1948) 

the differences which compose these quantities. involve sacrificing a large 

have suggested methods of overcoming the difficulty, but they mv 

proportion of the data. 

46.30 There is an intimate connexion between the variance the differen 
a series and its serial correlations. We have for a series o n er 

*E (Au t ) 2 = (u t+1 -u t ) 2 =’£ {{u t+1 -u)-{ut-u)Y 

*=i 1 x _ »-i 

= n z\u t+1 -uy- 2 n ?:\u t+1 -u)(u t -u)+ S ( [u t -u ) 2 . 

Approximately, then, on division by n — 1, 

var Am, = var u — 2 cov (u t+ i,u t ) + va.v u 

= 2var«(l—r 1 ). C 46 ' 89 ) 

To the same degree of approximation, 

var A 2 u t = S ( u t+i —2u t+1 +u t ) 2 

= var m(6 — + 2r 2 ). (46.90 

Likewise, we have in general 

var A p u t = var S (— l) j u t+ ^- 


(46.90) 


= var u( i (PY 


U=o \JJ 

-•r©ty- 


-sogjo 


= var u 


“"u p) '\p-ir \p-2 

-Mnrutl -JL ri + -?l(P-V 

\pj l p+i (p+i)(p+: 


(p+\)(p+ 2) 


(46.91) 


\ j. / \. x \r 1 ~ j\r * *-*) ) 

We can similarly express the serial correlations in terms of variances of the differences. 

Pi it 


T/ _ V£lr TT 

y i - j 2 jr~ y L 0 = var u. 


(46.92) 


\J / 

Then it may be shown (cf. Exercise 46.14) that 

A O / . O *1 \ 


(2!)2 nn, r 3 +... 


(46.931 
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with a 


n °4lectea f cSain end-effects, writing, for 


• f these iv * 11 

dative stopW n °e gIe cted_car. a in 

long series ’ rt " 1 w 2 ^ S *?+i • 

the formula as exact these end-effects from thTo^ 1 

thC 

Define a series u + 

v = X x yi+ X *y* + r y n ^ + ix n y n 

w) ’ • • • +*«-*y°-> +ix :‘- iy ’'- 1+ * x " y - 

%.(**> ' obeys the recurrence rule 

The general law of formation «—1 v x 

M) = i ?J?-'’ {Xin+ ' i+ ' yi+ ' } ’ 

s0 ** for example, £ W three tenns in S» have coefficients J, 1, 1 
Now define the modified quantities 

m V 0 = 2 (W) « 2 A 

m F a = S (w -d(A«) 2 /2« 


(46.94) 

(46.95) 

(46.96) 


(46.97' 


(46.9? 

(46.99 


v 

m v P 


= JV_„(A p «Y/&) n - 


(46.10 


(46.1C 
The simph 


(46.li 

(46.1 


Likewise define 2 

nJ'p = ^(m-p) u i ^i+p/^itn) • 

Then these quantities obey exactly the relations (46.91) and (46.93) 
way of seeing this, perhaps, is to consider the series 

0, 0, 0, u u u 2 , ..., u n , 0, 0, 0,. .. 

The first differences are 

0, 0, « 1} # a -u lf .. ., u n -u n _ lt -u ni 0, 0 . . . 

and their sum of squares is 

«(4 - 2 i “?-2 V«,. %+1 = 2 ,F 0 (1 - lf J. (46. 

In fact, for such a series, (46.97) is equivalent to 

n 

and the arcmmAnf u i. , (^(m) tq) ( w r /t .) 

f ana nence for our modified 


46.32 As we hav 

d of ^ polynomial plus 


:es. 


a 
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error we can also enquire, given a set of V’s, what is the best estimate of the error 
variance. The question has been examined by Quenouille (1953b), who seeks a linear 
function of the V’s which has minimal variance. In most practical cases it is more 
realistic to consider the possibility of serial correlation among the errors. Given 
a series consisting of a polynomial plus a serially correlated error, the problem 
is then to extend the variate-difference method so as to estimate the serial correlations. 

This question was also considered by Quenouille. Cf. Exercises 46.9-11. 

46.33 But, however we approach the subject—by fitting polynomials directly, 
by moving averages, or by any other smoothing process—we encounter the difficulties 
mentioned in 46.18. The trend elimination will distort the residuals. There seems 
no escape from this situation. We can only hope to make the best of it, and this we 
can do in two ways: by choosing methods which, other things being equal, minimize 
the distortion; and by arranging our procedure so that, if we have misgivings at any 
stage of the later analysis, we can disentangle the distortion due to smoothing from 
other elements in the residual series. We proceed to examine the possibilities of the 
second line of attack. 

46.34 Let us suppose that we divide our series of n terms into consecutive sets 
of s, and fit a polynomial of the same type to each set. Within any one set we may 
get a satisfactory trend-line. But clearly the line for any set must join on to the line 
for the next, and, in some acceptable sense, smoothly so. Subject to this matter, 
which we shall examine in a moment, such a method has the advantage that it treats 
the series as a set of independent blocks, and we can apply an analysis of variance to 
them. The method may be regarded as a compromise between the moving average 
and fitting a polynomial to the whole series. It was considered at length by Rhodes 
(1921) and has been extended by Quenouille (1949a). 

As a simple example of the method, consider the fitting of straight lines to sets 
of three points. If the fitted value at u x is 2 b x , and u 3 is 2 & 2 , the value at u 2 must be 
j, + b 2 . If, further, the fitted value to u 5 is 2 b 3 (that at w 3 being already determined as 
2 b 2 ), the value at « 4 is b 2 + b 3 , and so on, the values being 

actual u x ti 2 W 3 W 5 u 6 u 7 ... 

fitted 2 b x b x -\-b 2 2 b 2 b 2 -\-b 3 2 b 3 b 3 -\-b x 2 b x . .. (46.105) 

The trend-line so determined will be continuous, but its first derivative will be dis¬ 
continuous at # 3> « 5 , « 7 , etc., i.e. it consists of a series of straight lines of extent three. 

46.35 The actual values of the constants b may be determined in the usual way 
by least squares, i.e. we may minimize 

(u 1 -2b 1 ) 2 +{u 2 -(b L +b 2 )} 2 +{ii 3 -2b 2 y+ etc., (46.106) 

giving the set of equations 

2u x + u 2 —Sb x —b 2 = 0 

u 2 +2u 3 + u i -b 1 — 6b 2 -b 3 = 0 (46.107) 

etc. 

The equations are not difficult to solve, but once again they can be simplified very 
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1 Let us, as usual, consider 


much by o f terms uv •'* " 2m+1 ) # a , « 3 

0 K Mi Um+1 


2b m , h + K- 


• th fitted constants 2 *n h+ b » '' V whic h we take the last term as eq Ual 

.^e procedure of f;, 3 c ^" lar .'- Writmg «, for 
This is render «he -*» 

to the first M the sum , , 2 , _ _. + (u 2m m) 


iius»“" ~ C e renacr 

o the first an h sUin +(u 2m -2b m ) 

re have to <““^_ (Ik+U }-+(«.- 2 « + 

TT q 


leading to the equations 
6#1+ ^2 
^ + 6^2+ ^3 

£ 2 + 6£ 3 +^ 


, } = K; + 2«e + «e = ffe. sa 7 

+ ” = « 3 + 2 « 4 +« 5 = V. «r 

= a,+ 2«,+»7 = u » “V 


(46.110) 


• • • * \ b [+6b m = u^ + ^+Ui = U 2m , say.; 

, efficients of the V s form a symmetrical circulant 

The advantage of this form js that the^ ^ (For the method see Quenouille, 1949a, 

matrix which can be solved once esu j t [ s t0 express the &’s in terms of the 

and Good, 1950 -cf. Exemse 4 U 3 .) 

linear functions of the U s in ( j , 

Fvprcise 46.12 generalizes the results above. 


Example 46.10 

We revert to the data of Table 46.1 with values of u t as given in column (4), 
except that \(u x +u 51 ) has been substituted for % and u 5v The values are repeated in 
Table 46.4. The functions U ( are shown in column (3) and the corresponding values 
of b in column (4). Thus, for example, 

U 2 = 76 -5 + 2(—90) + ( — 17) = -120-5 
U 50 = 221+2(270)+ 76-5 = 837-5. 

Also 

6^i +b 2 +b m = -242-2086-5-6002+127-3086 



•o *££« o? u “f 0 g ; (rf d (3 5 ” )) n 7ht7^ua 0 , ut ss t residual in orc 

In our present example * ^ ^ ^ * 


- ''T,TJ0-O 

s k V t = 448,274-26 

We have fi„e d 25 c ^ence = 26,184. 

10 51 observations, one of which was adj ust e d) 
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Table 46.4—Series of Table 46.1 fitted by straight lines to sets of threes 
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t 

ut 

Ut 

b t 

t 

U t 

Ut 

bt 

1 

76-5 



27 

75 


17-6740 

2 

-90 

-120-5 

-40-3681 

28 

37 

161 

3 

4 

-17 

-32 

-92 

-5-6002 

29 

30 

12 

96 

274 

39-4348 

5 

-11 



31 

70 


19-7171 

6 

-59 

-97 

-18-0309 

32 

30 

182 

7 

32 



33 

52 


24-2620 

8 

28 

110 

16-7855 

34 

64 

214 

9 

22 



35 

34 


48-7106 

10 

62 

196 

27-3178 

36 

126 

344 

11 

50 



37 

58 


27-4740 

12 

2 

133 

15-3071 

38 

57 

267 

13 

79 



39 

95 


53-4452 

14 

-7 

139 

13-8390 

40 

75 

403 

15 

74 



41 

158 


54-8549 

16 

85 

259 

40-6586 

I 42 

99 

454 

17 

15 



43 

98 


71-4253 

18 

-4 

68 

1-2096 

44 

159 

572 

19 

61 



45 

156 


88-5930 

20 

39 

140 

20-0840 

46 

180 

717 

21 

1 



47 

201 


114-0164 

22 

51 

156 

18-2862 

48 

239 

900 

23 

53 



49 

221 


127-3086 

24 

48 

191 

26-1987 

50 

270 

837-5 

25 

26 

42 

10 

137 

15-5212 

51 

76-5 



Totals 

- 

6545-0 

818-1244 


25 degrees of freedom in the estimate of residual variance. The estimator is then 
26,184/25 = 1047 against a value obtained (from first differences) in Example 46.9 
of’1075. 

But we can do better than this. The method of fitting lines to three points suggests 
that there may be correlation between observed residuals in neighbouring points of a 
set of three, but not between sets. 

We can, in fact, regard the series as \{n- 1) blocks of two, the two differences 
in a fitted triad having values typified by b 2 -b x , b 2 -b v The sum of squares within 
blocks is estimated by T \(I,u i -I l u i+1 ) 2 which is found to be 406-12. Thus we can 
analyse the total sum of squares as 



d.fr. 

SS 

Mean square 

Fitted constants 

25 

448,274-26 


Within blocks 

1 

406-12 


Residual 

24 

25,777-87 

1074 


50 

474,458-25 



The residual mean square is now in almost exact agreement with the value obtained 
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1951 

| 1952 

1953 

| 1954 

1 1955 

1956 

j 1957 

1958 

First quarter 

295-0 

1 324-7 

I ~ 

372-9 , 

354-0 

333-7 

323-2 

304-3 

312-5 

2nd „ 

1 317-5 

323-7 ' 

380-9 

345-7 

323-9 

342-9 

285-9 

336-1 

3rd „ 

314-9 | 

322-5 

353-0 

319-5 

312-8 

300-3 

292-3 

295-5 

4th „ | 

321-4 

332-9 

348-9 

317-6 

310-2 

309-8 i 

1 

298-7 j 

1 

318-4 


Table 46.6 Data of Table 46.5 with origin 300, values multiplied by 10 



1951 

1952 

1953 

1954 

1955 

1 

1956 

1 1957 

1958 

| 

Totals 

j Mean 

First quarter 
2nd „ 

3rd „ 

« „ 

-50 

175 

149 

214 

247 

237 

225 

329 

— 




_ 

729 

809 

530 

489 

540 

457 

195 

176 

337 

239 

128 

102 

232 

429 

3 

98 

43 

-141 

-77 

-13 

125 

361 

-45 

2203 

2566 

1108 

275-375 
, 320-750 
138-500 

- — . . 

— 

— 



184 

1579 

197-375 

Totals 

488 ■ 

1038 

2557 1 

_1 

1368 

1 

- 

— 


"-- 

-1 

__1 

806 

1 

762 

i 

-188 

625 

7456 
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TIME-SERIES: TREND AND SEASONALITY 

46.38 Consider first of all the possibilities of distribution-free tests. It is tempt¬ 
ing to rank the quarters within any one year from 1 to 4 and consider how the ranks vary 
from year to year. A little reflection will show, however, that such a procedure does 
not disentangle seasonal movement from trend. If the data were uniformly increasing 
. in time the first quarter would always rank the lowest; but this is not a seasonal effect. 

In fact, to make any progress it appears necessary to make some attempt to eliminate 
trend as a first step. 

One simple approach is to assume that deviations from the annual average are 
seasonal. This is equivalent to supposing that there is a trend from year to year but 
that, within a year, departures from the year’s average are seasonal effects. This may 
obviously be a somewhat indifferent approximation to the truth. If we are prepared 
to adopt it, the procedure in Table 46.6 would be as follows. 

Over the eight years concerned the means for the four quarters are 275*375, 320*750, 
138-500, 197*375, themselves with a mean of 233*000. The deviations from this 
mean are then 42*375, 87*750, -94-500, —35*625. In terms of the original variables 
of Table 46.5 the corresponding values would be 304-2375, 308*7750,290*5500,296*4375, 
or, on the basis of a mean of 100, a third of these values, namely 101*41, 102*93, 96*85, 
98*81. These are taken as indexes of seasonal variation. The general procedure, to 
eliminate seasonality from the original data, would then be to divide all the first-quarter 
figures by 101*41, all the second-quarter figures by 102 * 93 , and so on. 

To test these seasonal indexes, we must write down the model. Our present 
procedure is to assume that each observation is the sum of three effects: a yearly value, 
say y, a seasonal value (constant from year to year in proportional effect), say and 
an error which is random. Thus, 

u t = yt s q + £ > t=l,...n; q= 1, 2, 3, 4. (46.111) 

If the trend is slow, so that the seasonal effect may be regarded as constant from year 
to year in absolute (not proportional) magnitude, we can write approximately 

u =yt+ s a+e> (46.112) 

which is an ordinary analysis of variance model with a two-fold classification. If 
the trend is not slow we ought to work with logarithms so as to obtain 

log u — log y 4* log Sq + Tj. (46.113) 

The weakness of this model is apparent. We assume that y is a constant for the 
year so that, for example, the value for the fourth quarter of the first year is y x + s 4 , 
and that for the succeeding quarter is y 2 + s x . These values may not “join on” 
smoothly in the way required by our intuitive feeling about the smoothness of trend. 
We shall therefore not bother with the arithmetic analysis. Indeed, we should have 
dismissed this method more summarily were it not for the fact that it is widely used 
in elementary texts. 

46.39 A second possibility is to use a moving average to eliminate trend before 
examining the residual values for seasonality. We then, of course, run into the danger 
of distorting the residuals. However, if we choose our moving average with care 
we can minimize this effect so far as concerns seasonal effects. We noted, in fact 
in 46.16 that if the simple moving average (with equal weights) is equal in extent to 
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TIME-SERIES: TREND AND SEASONALITY 

f- 

the period of a cyclical component, the trend-value of that component is zero, so that 
the residual is unimpaired. 

For the data of Table 46.5, this involves the use of a simple moving average of 
fours, which will be adequate to remove a linear trend. Our original data, however, are 
the averages for a quarter’s prices and relate, then, to time periods of three months 
centred on February 15, May 15, August 15, November 15 (or thereabouts). The 
average of an (even) set of four will give us a trend-value at some point half-way between 
these dates. To bring the time-point of the average back to comparability with the 
originals we must “ centre ” the average. This is most simply carried out by taking 
the mean of consecutive pairs of the four-point average. Thus, in Table 46.6 the mean 
of the first four values is 122, and of the next four values is 196*25. The mean of 
these two, 159-125, is taken as the trend-value corresponding to the third quarter of 
1951, namely where the original value of the series is 149. The process is clearly 
equivalent to fitting a five-point average with weights 

HI, 2,2,2,1], (46.114) 

Proceeding in this manner on the data of Table 46.6, we find the residuals shown 
in Table 46.7. The deviations of the means (each based on seven months) from the 
overall mean of 24-05/4 are 62-45, 86-17, -88-39, -60-26. In terms of the original 
variables the corresponding values would be 306-245, 308-617, 291-161, 293-974 or, 
on the basis of a mean of 100, 102-08, 102-87, 97-06, 97-99. These are substantially 
different from the results of the method of 46.38. 

46.40 It is also of interest to see what happens if we eliminate trend by a more 
elaborate form of moving average, and we will consider the fitting of a cubic to seven 
points with weights -/tC - 2, 3, 6, 7], The residuals are shown in Table 46.8. The 
seasonal indexes will be found to be 

102-27, 102-29, 97-31, 98-13 

as compared with 

101- 41, 102-93, 96-85, 98-81 (in 46.38) 

102- 08, 102-87, 97-06, 97-99 (in 46.39) 

Although the general picture is the same in all cases (a seasonal peak in the second 
quarter, a seasonal trough in the third) there are large enough differences in these results 
to embarrass us in work requiring great accuracy. Our inclination would be to use 
the method of 46.39. That of 46.40 runs into some danger of fitting too well, in the 
sense that the trend-line may embody some part of the seasonal effect. It seems im¬ 
possible, however, to lay down any completely objective rules for the treatment of 
seasonal effect versus trend. Our general recommendation would be to try several 
methods and to choose the one which appears to give the most reasonable results; 
and, in any published work, to state exactly what has been done. 

46.41 From the point of view of spectrum analysis, which we discuss in Chapter 49, 
there is more to be said about the effect of trend elimination both on random residuals 
and on seasonal components. We shall see that it is possible to correct the power 
spectrum for distortions due to trend fitting, at least in certain cases. 
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2^T Sut+ fn(m + l)(2m + l) 

coefficients of a moving average based on this line f 0r 


Hence show that the sum 
the point t = J is 


of squares of 


2m +1 
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46 2 Fit a cubic to the last seven points of the sheep series of Table 45.4 and show that 
•ves a trend for the final four values of 1639, 1687, 1750 and 1807. 


it gives a trend for the 

Show that the weights in the Spencer 21-point formula are 


46.3 


A[-l, -3, -5, -5, 6, 18. 13, 47, 57, 60] 


and that if it is applied to a random series the variance of the resultant is about one-seventh 
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is given by 

and generalize this result. 

(Cf. Higham, 1882-5) 

46.7 Show that in a long random series of normal elements with variance a 2 the serial 
correlations are uncorrelated, and that 

var r/c = 1/m. 

Hence from (46.92) derive the large-sample formula 


var V p 




2 + 2p 2 (p-iy 


n { (p + iy^(p + l) 2 (j> + 2) 2 




(Quenouille, 1953b) 


46.8 Show that in the notation of 46.30 as applied to a random series with variance cr a 
and fourth cumulant k. 


Vi, V}) — — = I*. ( 2i + 2 A /( 2 !)( 2 A = - a, 


cov (Vi, ra--* = _ ^ . + ,JI y. A; .j = - say. 

(Quenouille, 1953b) 

46.9 In the previous exercise, given an estimator of the error variance 

m + p 

t = 2 a Vi 

i—m +1 

with variance (K 4 -Mcr 4 )/w, show that t has minimal variance if 

m+p 

2 aijd — X = 0, j = m + 1,..., m+p, 

i—M+1 

m+p 

— 2 ci = — 1. 

i=m+1 

(Quenouille, 1953b) 

46.10 A series consists of a polynomial in t of degree m plus a component which has the 
same variance for all t but the successive values of which may be serially correlated. 

Define 

o _(p + l)(p + 2) 

Riv - 2l 


AVp, p = m, m+1, . . . 


Show that for a long series 

D T// 4 P . 9 P(P-V ,, _ 16^0-l)(p-2) "1 

Rip ~ Vo \ ri p + 3 r2 + (p + 3)(p + 4) 3 (P + 3)(p + A)(p + 5) 4-1 * • ’J 
and hence that if the serial correlations higher than the first order vanish, 

E(R 1P ) = o*r v 

46.11 Continuing the previous exercise, show that, defining 

_ (j> + 2m-l)(p + 2m) 

(2m —l)(2m) ~ ” 

and if serial correlations higher than the mth vanish, that 

E(Rmp) = ° 2 r m - 


(Quenouille, 1953b) 
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(Good, 1950) 
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s 
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CHAPTER 47 


STATIONARY TIME-SERIES 


47.1 If we remove from a time-series the elements attributable to seasonal variation 
and trend we shall, in general, be left with a series oscillating about some constant 
value. This movement may be so small as to be virtually non-existent the series 
then consists entirely of seasonality or trend. Or the seasonality or trend may them¬ 
selves be non-existent, in which case the series is entirely oscillatory. In the present 
chapter we shall study these oscillatory series, supposing that trend and seasonal effects 
have been eliminated or do not exist. Strictly speaking, we ought, perhaps, to treat 
seasonal effects as part of the oscillatory movement and not regard them as eliminated 
beforehand. But we shall see that there are types of oscillation (the rule rather than 
the exception) which are not seasonal in our sense, and it is better to keep them distinct 
as far as possible. 

47.2 Let us begin with some intuitive ideas. In Table 45.1 (barley yields) we 
have an example of a series which fluctuates about a mean value to about the same 
extent. But we might have a series, as in Fig. 47.1(a), in which the extent of oscilla¬ 
tion systematically increased or, as in Fig. 47.1(b), in which the amount of oscillation 
itself oscillated. We shall exclude such cases from discussion and confine our attention 




(a) (b) 

Fig. 47.1 (see text) 

to series for which the amplitude remains more or less constant. This does not mean 
that the amplitude of the swings has to be exactly the same, but that there is no 
systematic effect present. 

47.3 To make these ideas precise, consider a set of random variables arranged in 

order: u lt u 2 ,... ,u n _ Let the distribution function of any set of n consecutive 

u’s, say u t+1 , u t+2y ..., u t+ni be 

F(Ui+i> Ut+ 2 > • • • > ( 47 . 1 ) 



tne »* — 

c criables of unlimited extent as defined by a 

47 4 In regarding a of „™problem in the definition of mean values. 

4 '* 4 . flinct ion we arrive at a new P say Ml , for different senes gener- 

^■St of dl, consider the can consid er the limit of some set of u's 

Zd by (47.1). Or, in the second pi ’ ^ ^ n tends t0 infinity. 

(or a function of them), say «£«*■■ ^ der ave rages in a population composed of 
In the first class of case we h ^ happen . Each such series ls possible, and 
different ways in which the the process. We may have only one 

any one which occurs is this is t b e rule rather than the exception. I n 

realization to ex ^ e ’ “ ’ le of one observation from the process. We shall see 
ataly ’wtyM does not seriously limit the possibility of inferences about the process. 


47.5 We shall, from now on, assume 
then have, for all t, 

P = E ( u t) = 


o* 


that the mean and variance of u exist. We 

(47.2) 

(47.3) 


— CO 


u t dF(u t ) 

= E{u,-pY = I" (u,-nYdF(u,) 

J — 00 


We also assume that any pair of u’s have an autocovariance 

* = = r-i, 

with the corresponding autocorrelation 

As noted in 45.33, the totality of coefficients o~( = 1\ • (47 ' 5) 

gram of the series. We may distinguish P ° u ’ p2 ’ * ’ ls callec * the correlo- 

on the autocorrelations, and the observed com-lr™ * le * heoretlcal correlogram, based 
calculated for any particular series of length ^ gram> based on the serial correlations 


(47.4) 

(47.5) 
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1 Pi Pa • • • Pn-1 

Pi 1 Pi • •• Pn-2 


(47.6) 


Pm—1 Pn-2 Pn-2 • • • * 

Any diagonal running from North-West to South-East has the same elements. 


5 


t 

t 


Exdwipte 47,1 

The fact that the Laurent matrix is non-negative definite implies certain consistency 
conditions on the autocorrelation coefficients. The determinant of any minor 
on the main diagonal cannot be negative. Thus, for example, 


i 


1 

1 

Pi 

1 

Pi 

1 

1 

Pi 

P2 

Pi 

1 

Pi 

P2 

Pi 

1 


l+2p?(p 2 -l)-p! 


m 

8 


* 


= (l-p 2 )(l-2p?+p 2 )^0. 

Thus, unless p 2 = 1 (and even in that case) we have 

Pz ^ 2pf — 1, 

which is by no means a trivial result. 


(47.7) 


Example 47.2 . 

As an example of a scheme which generates an autocorrelated stationary series, 

consider a process defined by 

U t+ 1 = pU t +£t+ 1 > ' * ' 

where e is a random variable with zero mean, and values e p , e q are uncorrelated for 
p ^ q. We then have 

£(«/«) = />E(“t) 

and except perhaps in the trivial case p = 1 , it follows that stationanty leajaltes 

£(«,) = 0, all t. ( 47 - 9 ) 

It will be seen from (47.8) that u, depends on s„ «,-» etc., but not on £,+,• Thus 
we have 

E{u t {u ui -pul )} = E(u t e [+l ) = 0 
and hence (“» “<«) = P var = P"* 

and the correlation between u t) u t+ 1 = p. \ • 

If p k is the Ath order autocorrelation we have likewise 

E{u,(u,+i-puw t-0) = 


Pk = />*• 


and hence 


(47.11) 
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P STATIONARY TIME-SERIES 

; his point, without prejudice to that discussion, we may present a f ew ^those in 
as illustrations of the type of material encountered in P ra £ice- ° m , barl yields) 
Chapter 45 are, on the face of it, stationary in character, eg. Table . index 

and Table 45.2 (rainfall). Table 47.1 is a famous senes of trend-free? omena i 
numbers compiled by the late Lord Beveridge. It extends oyer 370 y - P i point 
length of time for economic series. Table 47.2 gives deviations from a P r>43_i896. 
moving average of marriage rates in England and Wales for the perio simple 

Table 47.3 is an artificial series obtained by superposing a random term on 
harmonic. Table 47.4 is another artificial series generated by a more ela ora 
of the type of Example 47.2. 

Table 47.2—Marriage rate in England and Wales: deviation from a simple 
11-year moving average for the years 1843 -lovo 

Units 1 in 10,000 


Marriage 

rate 


Marriage 

Year 

rate 

- 5 

1879 

- 7 

80 

1 

81 

6 

82 

8 

83 

9 

84 

- 2 

85 

- 8 

86 

-10 

87 

- 7 

88 

0 

89 

8 

90 

12 

91 

7 

92 

5 

! 93 

4 

94 

- 3 

95 

- 6 

1 

96 

1 


Marriage 

rate 


47.8 Suppose now that we have an observed series u' ti , . . ., u' tt , the primes de¬ 
noting the fact that this is a single realization. Each u has mean fx and variance a 2 . 
Let us define a time average 



(47.13) 

M(u') = lim («'). 

i ^—y — oo, £.j—>co 

(47.14) 


Then we appeal to a theorem of Birkhoff (1931) and Khintchin (1932) which we state 
without proof: 

(a) If u t is stationary with finite mean fx, M{u') exists for almost all realizations, i.e. with 
probability unity. 
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(b) If aid only if Um 1S ft = 0 (4; 

. , , the average” “ the *** vi *- 

and (a) « satlsfied ’ ? £(„,) = Af(«0- (4; 

i» It implies that in practice we can estimate the t . 
This is a most important result of a sjngIe realization. If this Were 

so,Estimation 72 ^realizations would be practically unposstble. 

TaMe ^ V y^ 

, r trim of Number of Value of Number of Value of 
N T™° f Sef S™° series term series 


5 

12 

7 

5 

3 

- 2 
-12 
-12 
- 8 
- 1 
11 
13 
12 
7 
5 

- 1 
- 6 
-14 
- 8 
1 


Hi 

-3 “5 

5 


■BAN 1 


■Ml 


VA« 


>Graph of the values 0 f Table 4 7 3 





Values of series 
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Table 47.4—Values of series tit = l lut_i-0-5ut-2+e* where et * rest 5 unit 

random variable with range -9 5 to 9 5, rounded off to neare 


Number 

Value of 

Number 

Value of 

of term 

series 

of term 

series 

i 

7 

23 

- 4 

2 

6 

24 

- 5 

3 

- 6 

25 

- 9 

4 

- 4 

26 

- 4 

5 

3 

27 

- 4 

6 

- 4 

28 

3 

7 

- 5 

29 

9 

8 

- 1 

30 

4 

9 

10 

31 

- 8 

10 

10 

32 

- 6 

11 

6 

33 

- 3 

12 
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0 

14 
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36 
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15 

- 2 

37 
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16 

6 

38 

3 

17 

17 

39 

- 1 

18 

24 

40 

- 8 
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17 

41 

- 3 

20 

4 

42 

- 8 

21 

1 

43 

-10 

22 
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44 
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45 
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53 
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the ADV ment £(a-/<) 2 is finite, the p roc . 

4,0 . /\ _ E(u) and the second irnpor tant extension of the Birkh 

«/ ^ We^Bte, again without Pto . true( fof almost all realizatiens, thj ^ 
called ev*c- " For an crgodic P««ss ciJcu|ated fro m a realization 

SSSS. is «-> to (47 

ctimote autocorrelations from a single realization 

Here, again, the result enables us ** 7 but it is not purely formal either. It 

;„fdit.on (47.15) is no very re ^ ^ as the terms to which they rel ate 

be obeyed if the sutocorreta.»ns _ if the series is a harmonic, 

become further apart, but not, 



M oJlo i| ope later, is a useful instrument for exploring 
47.10 The correlogram, as a time . ser i e s. There is a second function which 
the nature of the internal^ relation to the correlogram in much the sa me 

relatio^a^th^characteristic function to the frequency function. 

In fact, let us define a function 


„ 2, p,. sin aj 

W( a) = a + 2S %—7 -• 

j=i J 

If, for some » onwards | ft,,* | < 1, this converges. We also write, subj 

J ITT 00 

a>(a) = = 1+2S pj cos aj. 

da j= l 

In virtue of the relation pi = p_y, and the fact that sin 0 is an odd function of d 
may also write 


(47.18) 
ect to existence, 

(47.19) 
we 


00 

w(ol) = 2 pj cos ^ 

— CO 
= £ 


(47.20) 

(47.21) 


~ rj'' • 

— 00 ' 

This last form exhibits w[a) as a Fourier transform of the sequence p-. Multiplying 
(47.20) by cos ak and integrating term by term, we find F y 8 

C n 00 rn 

w(a) cos ka da = 2 pj cos ay cos a/e da 

J 0 i=-co J 0 


and hence 


= ^P/c 


Pi = - 


We 


may also write 


71 


f/r a 

0 wcos Jv-da = - j cos aj dW( a). 


(47.22) 


^T a ) is called the 


1 r* 
P/ = - 

71 


zo(a)e~^da. 


38 Pen ° d ^ From (47.18) we see that C M0) S 

H/(U) = 0, W(tz) = PF (2jr ) = 2 tt. 


S' ^ 


*Xr- 


(47.23) 


«r 


\ 
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47.11 The power spectrum may also be introduced directly, without reference to 
the correlogram, in the following manner. 

For some given a, let us consider the correlation or covariance of Ui and cos a t. If 
there is some rhythm (or pseudo-rhythm—let us not beg any questions) in u t with 
frequency oc, the correlation will be high provided that u t and cos a t are in phase. If 
the series is at unit intervals and measured about it 6 mean, consider 

1 71 \ 

a (°0 = — 77—7 s u t cos at 

VW «=i I _ (47.24) 

1 n I 

6 (°0 = — 77 —r 2 u ( sinotf 
yOwi) t =l J 

We have 

/(a) = a 2 (a) + 6 2 (a) 

= — {(2 u t cos at)* + (2 u, sin a*) 2 } 

1 f U U ~ l . ^ 

= — -jS uf + 2 2 2 u t u t+k (cos a t cos a (t + k) + sin at sin a(f + k)} > 

If n n-t ^ 

= —+ 2 S 2 u t u l+k cos ka> 


t =i *=i 


= — il+22 r ;c cosAai (47.25) 

7T ( *=i J 

where s 2 = Huf and r k is the correlation-type coefficient Y*u t u t+k /'L uf. In the limit 
this becomes 


/(a) = — 1 1 +2 2 p k cos ka 

7C ft — 1 

= — / 2 p k cos kaX 

71 [ lc = -oo J 


(47.26) 


= ^w(a). (47.27) 

71 

The quantity /, which we call the intensity, is thus the spectral density multiplied by 


47.12 It is customary to graph the spectrum with 1 as ordinate against a as abscissa. 
In the earlier stages of the development of the subject it was more usual to compute 

A = - £ u, cos??, I = 2n/«, (47.28) 


and calculate 


„ 2 2nt , r, , 

A = - ^ «<cos—, A = In/a., 
n <=i 4 

2 £ . 2?rt 

B = - S Ui sin —y- 
n t=i 4 


^2 __ A 2 +B 2 = -o 2 w(A) 
w 


(47.29) 


(47.30) 


the limit. The graph of S 2 against 2, the wavelength, was called the periodogram, 



£-\ 


w* . F n theobv OF statistics 

THE ADVANCED h some authors use “period 0gram „ 

*12 . i we shall preset , . 

in < 47 ^ for d* is tihltif « contains a ha^ 

d T?thf% ue of S ! a ! * 15 Juries” We shall return to the subject in Chapter 49. 
with «, 1 f or an infinite 

at that pot**. at leas correlogram or the spec rum m exploring 

...»»•-rst 

l!!!X^f r ^ kMwkdBSOt '!^T!h"%^' ni ia p hysics - but , therc 

’Zn is more revealing m, uld use both (e.g. oceanography meteorology, 

where the prudent research worker ^ ^ we have remarked, tells us some- 

and some biological P roc f. ses > values of the series which are separated in time, 
thing about the relationship Dew ^ ser i es is in step with certain fundamental 

The spectrum exhibits the:ex ^ tuning a radio S et, a signal of high power 

rhythms; calculating the P coincides with an incoming frequency. For 

,s"£ s ™ ■"r ,h | “T"' 

rams ta the generating system, but this is a procedure which must be carried out 
with some care in interpretation. 



“te . / 


Autocorrelation generating function 

47.14 In the spectral density function 

CO 

zv(oc) = 2 pje 

— m 




z = e 


,vx 


put 

CO 

Then ro(a) = I = G(u), say, 

and we thus derive an autocorrelation generating function. We shall also find it useful 

icovariance generating function 


(47.31) 

(47.32) 

(47.33) 


ta we tnus derive an autocorrelation generating funct 
to work with an autocovariance generating function 

C(z) = 2 y jZ i = o*G(z). 
— 00 x 7 


Moving-average series 

47.15 Consider now a series u or ,a „ 

* and a movin g average defined by 

co ^ 


(47.34) 




We have here taken the ■ i= ° ( 47>35 ) 

extent, so as to attain 

We then ha. “ 3 which w 



l<-„ tU ‘-** 0 v l+1 ._ 


r ■ 


< • 


i 
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= . S <x -i cc lcVi+j-k 
i,k = 0 

= £ S oc*a f+jW y,. (47.36) 

i = 0 z = o 

If the autocovariance function of u is C(z), it will be seen that the autocovariance 
function of f is given by 

r(*) = (S oCj- zi) (2 oc ? - *H) C{z). (47.37) 

In particular, if u is a purely random series C{z) = 1, and if we iterate a moving average 
k times, the autocovariance function of the resulting series is 

(47.38) 


(47.39) 


Example 47.4 

Consider a moving average of 2, a 0 = a x = of a random series. 

I» = J(1 +z) (1 + ^-1) = l(z~' + 2 + z). 

This gives us, as is otherwise obvious, 

Pi = h Pi = 0, j ¥= 0, 1. 

The corresponding autocorrelation generating function is derived by standardizing 
(47.39), so that the coefficient of z° is unity. Thus G(z) = %(z~ 1 +2+z). Put now 
z = e™. We find for the spectral density function 

w(a) = §(<?-*“+2+ e*“) 

= 1 + cos a. (47.40) 

The function is thus a cosine curve with a maximum at a = 0. If we iterate the 
average k times we find 

w{a) oc (1 + cos a) fc . (47.41) 

The constant term by which (1 +cos a)* is to be multiplied to give w(a) is most easily 
determined by the condition in 47.10 that 


j; 


zo(a) da = n. 


In our present case this gives 

/x #m + l)/l + cosaV 

"W = T(l+-« l ~2~) ■ 

This, for increasing k, tends to unity at a=0 and zero elsewhere. All the p,- tend to 
unity. The series thus tends to a constant value, as is otherwise evident from the fact 
that successive iterations smooth out fluctuation. 

On the other hand, if we take successive differences of the original random series, 

«o = h «i = ~b and we find after k differences 

w(u) cc —^y. (47.42) 

This tends to unity at a = n. The even order autocorrelations tend to +1, the odd 
order to -1. The series thus tends to terms which are equal in absolute value but 

alternate in sign. 
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^i^ng -«ge 


THE ad 

with weig hts 


tf" 1, ^^The autocovariance generating f Uncti ^ 

—.. 


„ _ >« we find 
Putting * - * «>(, 


constant. For variations in 


« the maximum value of tu(cc) occurs 


i ic some constant. * 

where c 0 may write j^U- C 47 4 .. 

when cos « = T ana , a) * Co { 1 -y( cos a " 2 ) ) ’ ^ 7 -44) 

f Hinate of M*) tends to zero, as k increases, compared 

Thus, forces«-i - « sa y>' y* continual iteration, therefore, the resultant series 
Sit rpenodic"- with period 2,/arc cos * - 6. 

Example 41.6 Slutzky s ^ a ' random ser ; e s « times, and take the mth diff er . 

Take a moving averag such that ten ds to some constant 6 between 

ence of the result, then, 

o and 1 , the series tends to a sine wave with wavelength 2 given by 2 = arc cos 

Taking the mth difference is equivalent to taking first differences m times. Hence 
the autocovariance generating function of the resultant is given by 

T(a) oc (1 + jr 1 f( l +zf(l-z-') m (\-z)"\ 

and hence, putting z = e ia , we find 

w(ol) oc (1 - cos a) 7 ” (1 + cos a) n . (47.45) 

We can evaluate the constant from the relation 


f 


w(a) da. = 7i 


and find 

/ x nT(m + n + Y) x 

~ 2 m+ *T(m+j) r(?i+ 1 ) ^ ~ cos a ) w ( 1+CGS a ) n * 

The maximum value occurs at a = a 0 , when 

n~m 1—0 


cos a 0 = 


n + m 1 + 0 ' 

and*fuZ m t“eAs T^oT ^7 ^ * this 

Using Stirling’. nly P enodlc but a sine wave 

g Stirling s approximation to the r function, we find 

a( x ) ~ _J irf(m+n)i 

2 m+n elm - iF(VTi tn (I - cos «)«(1 + cos a )», 

Tr „ H-fll J/ V2/ 1 

* = ^ +e this tawh to 


(47.46) 

(47.47) 

maximum; 
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(W 




(47.48) 


e \ 2m / x , 

tor e = 0 this tends to infinity. For e # 0 it will be seen, on taking logarithms 
and expanding, that the expression tends to zero uniformly in any closed interval 
exc u mg £ — 0. Thus w[a) has a single infinite ordinate at a 0 given by 

arc cos {(1-0) (1+0)}. 

IF(a) is accordingly a step function with a single step from 0 to n at that point. 

t o ows 10 m (47.22) that the autocorrelations of the resulting series are given by 

Pj = cos a(47.49) 

Consider now a given stretch of the derived series, say .. ., u$ for fixed N as 
n -> oo. We have 

N - 2 

E + = 2(N-2) var u{l — 2pl + p 2 ], 

which, in virtue of (47.49), becomes in the limit 

2{N—2) var «{1 — 2 cos 2 a 0 + cos 2a 0 } = 0. 

Hence in the limit 

u i+i -2 Pl u i+l + Ui = 0. (47.50) 

This is a difference ecjuation of a sine curve. 

Morar^(1949) generalizations of Slutzky’s result see Romanovsky (1932, 1933) and 


47.16 If tif is a stationary series the moving average 

Ci = P 0 u t +Piu ( _ 1 + ...+p h u t _ h (47.51) 

is also stationary. In particular, if u t is a purely random process with zero mean the 
autocorrelations are given by 

h-j 

2 Pi Pi+} 

-> 

2 PI 

0 

= °» J> h ■ (47.52) 

We have already seen in Example 46.7 that the correlogram may present an oscillatory 
appearance for such an average. J 


47.17 Wold (1938) has proved a theorem on the conditions under which a specified 
set of constants p lt p 2 ,, p h can be the autocorrelations of a moving average of a 
random series. Take the generating function 


Put 


G(z) = 1 +p{z+z l )+...+ Ph (z h +z~ h ). (47.53) 


y = (47.54) 

This will transform G(z) into a polynomial of degree h in y , say H(y). Then, for 
the p's to be autocorrelations of a moving average of extent h +1 it is necessary and 
sufficient that H(y) has no real root of odd multiplicity in the interval — 2 < y < 2. 
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• THE 7 „ 0 and all other ,'s vanish. 

^ l+pt y. 



^isthenlH-M— ' .f^l^ualto-l/Fn This wiU l ie in 

This has a root of odd-order nutlttph-yj^ ^ moving ave rage 0 f extent 2 can havt 

This has +2 unless Pi < *• . eV ident. 

r?t:= />» = ••• = °'f e Ninths moving average from the relation 

" ^U » * he / S . , + ^ = S ft,' 1 AW. (47.55) 

G(z) = S M* +* > j=o n ^ 

o/i' 1° tions (for, on identifying powers of *, we shall have 
There will, in general, be S ° Static) However, only one of these gives roots of 
h equations each of which is 4 u * d . / d in virtue of another result to which w e 
<f ^'" t T£ ,£ X -P-We solution. From (47.54) it is seen that 

fo/Lty y. the roots in , are S™ by^ + } Q (47 ^ 

and hence, having the prodnct unity, lie one inside and one outside the unit circle. 
From (47.56) we have 

* = tricin' 2 -!)*■ (47.57) 

Three cases arise: 

(a) H(y ) has a complex root y v Then the conjugate y x is also a root. Thus the 
corresponding quantities z x , z^ 1 , z*, C^i) -1 are r00 ^ s °f G(z) and thus one of 
(z-z 1 )(z-z* 1 ) and (z-z^ 1 )(z-z*^ 1 ) is a factor of 2 fazK 

(b) H{y) has a real root >2 in modulus. Then, from (47.57), z x and z x x are both 
real. One must be a root of 2 fa z J = 0 and this case then corresponds to real 
roots of 2 fa z j = 0. 

(c) H(y) has a real root < 2 in modulus. In this case z x and z x 1 are conjugate com¬ 
plex and of modulus unity. The factors z-z x and are therefore both con¬ 

tained in lifazi and 2 faz~* and therefore the root must be of even multiplicity. 

The theorem follows. 


y 


Autoregressive series 

47.18 Consider now a series defined by 

,. , , Ut ~ ~^ u t-i~ a 2 u t-i • • • -a hU t _ h +e t 

W .oh (putting «, = 1) we may write in the more form 

h 

= e i- , T ,_ 

Here e t is a random variable and , 

«ve values of a are independentTnd 0 ^' We sha11 SU PP ose 

ls an operator such that Du - lavc *^ e samc variance. 

i we may write (47.59) 

(*« jDi)u - ' 


(47.58 


(47.5 1 


as 
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giving the formal solution 


u t --- 


Soq D’ 


Si = 


< . 

ur 


— 2 PjBt-j’ 

j=0 

where the constants /? are related to the a’s by the identity in D 


(47.60) 


— 1 — . = S ft O'. (47.61) 

2 a.jD 3 

However, this is not the complete solution of the difference equation (47.59). Let 


z lt z 2 , ... , z h be the roots of 



A* A* r ft ----- 

Z h + <x. 1 z , ‘- 1 + ... +a /t = 0. 

(47.62) 

4 

Then the general solution of 

(47.63) 


21 a= 0 


may be written 



u t = Z A } -zj 

J-i 

(47.64) 


where the A’s are arbitrary constants. 

We shall now assume that | Zj | < 1 for any /, namely that the roots of (47.62) all 
fall within the unit circle, and that they are all different. Then for large t the solution 
(47.64) damps out of existence. 

The series (47.59) is regarded as having “ started up ” a long time in the past. 
^ Then the contribution to the solution (47.64) has disappeared and the complete solu¬ 


tion is, in fact, the particular solution (47.60). 

We shall call a series of form (47.58) autoregressive. It is a type of moving average of 
infin ite extent. If e is ergodic, then so will be u t , provided that 21 /Sf converges. This 
proviso is, in fact, satisfied, provided that the roots of (47.62) are all within the unit 
circle (cf. Exercise 47.19). 


In practice it is rarely necessary to discuss the roots of equation (47.62). J. Wise 
(1956), however, has shown by using a theorem of Routh, that the conditions concerning 
the roots can be expressed as algebraic conditions on the a’s themselves. 


47.19 It will have been observed that u t is dependent on e t , £*_], etc., but not 
on e (+1 , e l+2 , etc. Let us multiply 

Ea = e t (47.65) 

by u t _ k , and take expectations. We then find 

P* + a l/ 5 A;-l'b • • • + a /iP/c-/» = 0> k > Q } (47.66) 

a set of equations due to Yule (1927) and G. Walker (1931). In particular, since p_,- = Pp 
we have 

Pl + a l + a 2/ ) l+ • • ■ ^V-kPh-l = 0 

Pi + a l Pi + <*2 + • • • + ^hPh-2 = 0 

and so on. 
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~ d " - “ <«» 

where the 0’ s are S iven * ( 

& «pfe The Markoff «»“ 47 2 This is the simplest case of an a Wo . 

Consider agio" *' “““ ° he trivial case h = 0) : 
tegressive senes (apart from _ £(> 

which, for convenience, we shall = ^ (47.CS) 

This is known as a Markoff series. 

We have 

L- = 1+ P D+/>Z) 2 +... 

1 —pD 


t' 

m 


/■* 


and hence 
Hence 


h = />’• 

var m = var -j 2 p 1 £ i-j 

U'=° 

= var e 2 p 21 
o 

var e 


1-P 2 ‘ 

From the Yule-Walker equations (47.66) with h = 1 we have 

pj-ppj-i = o, y>0, 

ind in particular 

Pi = P - 

This is, in fact, why we named as p the parameter -a,.) Ft 


(47.69) 


(47.70) 


(47.71) 
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For the spectral density we have 

«<«) = lyn* - -i+ T l^+ r ~r l 

= -1+_^_+- - -r- 

l-pe^ l-pe-™ 

1 -P*_ _. 

1—2 p cos cc+p 2 ' 

The correlogram and the power spectrum are shown in Fig. 47.4. 


(47.73) 


Example 47.8 The Yule series 

The next most complex form of linear autoregressive series is known by the name 
of Yule and is given by 

u t + <x. 1 u t _ 1 + a. 2 u l _ i = e ( . (47.74) 

From the first two Yule-Walker equations (47.66) we have 

Pi + ai + a a pi = 0 

p z + ct 1 p 1 + <x. 2 = 0, 

giving 

Pi = ~ t ~ — (47-75) 


OCT 

p 2 = -a a + — l — 

1 + a 2 

„ _ Pi(}-Pi) 


i -p\ 


i -p\ 


(47.76) 

(47.77) 


(47.78) 


(47 - 78) 

These equations give the parameters ct lf a 2 in terms of the first two autocorrelations 
and vice versa. More generally, if v are the roots of 

OC ^ “I - QC^ X 0C 2 = 0 

pj = A/j , 3 + Bv\ 

subject to initial conditions 

po = 1 = A-\- B 
Pi = A[i + Bv. 

We then find 

= (>,_,) Jl +»<» -*-!)-^« (1 -/*■)]- (47.79) 

We can put this in a slightly more convenient form. Put 

^ . ft = pe l °, v = pe~ td . /An cry 


We find 


(47.79) 


(47.80) 


P = I a /<*2 I, COS 0 = 


2 } v« 2 r 


(47.81) 
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and (47.79) reduces to _ / sin (jO+fP) 


where 


sin y 

1 +P\ 


tan f = YZTfi 


tan 0. 


The spectral density function _ 

■to = (T5^{T+af +«I-2" a s + cos a+4a2 cosa 

Fig. 47.5 shows a typical correlogram and power spectrum. 




Fig. 47.5—Correlogram (left) and spectrum (right) of the Yule series 


Example 47.9 

We return to the discussion at the end of 47.18. In a linear scheme of the auto¬ 
regressive type, if the roots of the characteristic equation do not lie inside the unit 

iTml r r^lT ? "u efg0d r C - HS ° me Ue ° Utside the circle ^ series “ explodes.” 
It may still oscillate, but with ever-increasing amplitude P 

genCraI S °‘ Uti0nS d ° n0t dara P awa y. but provide 

foHxampIe X simnVT ' SO ‘ UtIOn * ‘ hen a “ wandering series ” Consider, 
example, the s,mple case «,.= u,. 1+ e,C learly the particular solution is 

u t = 2 e h 

and the variance of u increases without limit. 

Example 47.10 

then"isuidty andThe* correlogram ^47*821 X “• = *• T he coefficient p of (47.81) 

to be ergodic. g m (47 ' 82 ) does not damp. The system, in fact, ceases 
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(47.85) 
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47.20 The type of series exemplified by 

u ( = £t+e t , 

where is a deterministic harmonic term, may be regarded as a harmonic with super 
posed error. It is sometimes known as a scheme of hidden periodicities. 

There is a somewhat different type of process to which the name harmonic is so 
times given, though it is of no practical importance. A series may consist o e su 
of a number of harmonics, say 

u, = S Aa cos (a,- 1), (47.86) 


where the a’s are fixed but the A’s may vary from one realization to another. In. this 
case there will be certain linear relations of the Yule-Walker type 


h 

s 

j =0 


c jPj-h — 


0 . 


(47.87) 


Continuous series 

47.21 Up to this point we have been concerned with series which are defined or 
observed at a set of discrete points. Some series, as we noted in Chapter 45, ave a 
continuous existence in time, and there are even situations where we can form a con¬ 
tinuous record, as for example in the devices which graph temperature on a rotating 
drum. The fact that matter is ultimately discontinuous (if it is a fact) does not prevent 
us from regarding this record as continuous. 

.. For series which are defined by deterministic continuous functions, such as poly- 

^ nomials or trigonometric functions, this correspondence between the assumed con¬ 
tinuity of reality and the defined continuity of mathematics rarely causes any conceptual 
difficulty. But when we come to series of the stationary type in which there are jumps 
between successive points, expressed by random variables, we must consider this 
question of continuity more closely. Can we, in fact, have a continuous series which 
proceeds by random jumps, however small? Our own opinion is that we cannot; 
that there is something essentially antithetic between randomness and continuity. Any 
tendency, then, to take the mathematician’s customary leap from the discontinuous to 
the continuous case must be carefully controlled. It may well prove possible, of 
course, to approximate to discontinuous expressions by continuous ones, for example, 
to represent sums by integrals; but we must not forget the problems of interpretation 
which are involved. 


47.22 To deal with this subject rigorously requires a theory of stochastic integration 
which would take us beyond the scope of this book. But we may expound the basic 
results in an intuitive way as follows. 

Consider a continuous series u(t) defined in some interval —h to h. Taking the 
mean to be zero, which does not seriously limit our generality, we may define the vari¬ 
ance as 



(47.88) 
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h tends to infinity (. >«^ Serift 



Consider the transform of the autocorrelation June,ton, say «p), defi ned 

= '“ill J‘(47.90, 

Putting q = < + 7'. we reduce this to 

KmUif* f n^e-'Mi^dtdq 

4/rJ _/< J —A 

= lim 4^5 J" u(t)e~ m dt J* ^ v(qY”dq. ( 47 . 91 ) 

Hence, if the transform of the series */(/) is given by 

(f> u (p ) = u(t)e v dt = #(^) + (47.92) 


r < 


we have, on letting h tend to infinity in (47.89), for the transform of the auto cor relat 


function 


ion 


MP) = a 2 (p) + b 2 (p ) = | <f> u (p) | 2 . 


(47.93) 


s± 


<{> r {p) is the continuous extension of the spectral density which (cf. ( 47 . 21 )) is the 
Fourier transform of the autocorrelations. 

47.23 It is to be noted that, even for a continuous function defined over an infinite 
interval, the autocorrelation function does not determine the series u(t) uniauelv Id 
fact, given <f> r (p), we have from ( 47 . 93 ) ^ 

MP) = \MP) \ l &, (47 9 4) 

where ^ is any arbitrary real function. We shall then have, on inverting for 

"W ~ 2jz j up>- ,,p dp 

= i\jUp)^e^ dp ( 47 . 95 ) 

Since u(t) must be real, the imaginary cart of th • 4 , 

^ ^ e ln tegral vanishes and we have 

a , , t ^ ~~ 2^J_ co ^ cos & ~ l P) d P, (47.96) 

result due to Wiener (19301 FTp 

Ian eCt t °, conrer 8 ence ) arbhrary He'irr ^ “ ° dd funCtion of # but is otherwise 

shall constder this from' the poim of view \ ^ Uni< l Uel y de *ermine We 

view of spectral functions later. 


V * 


; '■ 
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Example 47.11 , r the Yule 

Consider an autocorrelation function of the type we have iscuss 

series at (47.82), 

... p k sm(kO + ip) 

rw sin ip • 1 1 s 

Consider what happens if we regard this as defined for all k , not merely int g 
We may, with a slight change of notation, put 

M) = “ (*?.±2), q >0. < 47 ' 97) 

sin ip 

When k is negative we must use | k | in this expression. 

For the transform we have 


f“ H^'sin Ske+ v ) eitPdk 

v J-* smw 


q _, _JL _ (47.98) 

q 2 + (6+p) 2 q 2 + (6-p) 2 ' 

The variable p in the transform here is not to be confused with the damping factor 
p in p(k). 

It is to be noted in (47.98) that this spectrum is continuous with a maximum at 
p = 0. The physicist would be tempted to regard this spectrum as analogous to 
that of white light, every frequency being represented. It does not follow, however, 
that the series u(t) arises as the sum of a large number of harmonic terms with all 
possible frequencies, in the way that white light can be regarded as the composition 
of a number of resonators oscillating on all wavelengths. 

On this question of white light, let us consider the limiting case when a time-series 
is defined at a series of small intervals A t and all autocorrelations are zero. We then 
find, from (47.21), that the spectral density (on a scale with unit time-intervals) is 
unity, or on a scale A t would be approximately A//2 tt, namely a constant. Certain 
physical systems do give rise to constant spectral densities, or to a series of equal 
ordinates very close together. The communications engineer describes the situation 
as one of white noise. Since he is trying to transmit signals on a determined frequency 
this so-called noise is a nuisance (like that in a radio set) which affects his reception 
and acts as an error-like disturbance to the purity of the incoming signals. 


Filters and transfer functions 

47.24 Suppose that we have a series u{t ) and a system of weights a{t). We may 
form what is, in effect, a linear weighted average v(t) by the formula 


®(*) = [ a(r)ii{t~r) dr. 
J 0 


(47.99) 

This average is over past values of «(<), including the present value, and does not look 

he future. If u(t) is defined at discontinuous points a similar sum may be defined 
For the spectrum, we consider y uenneQ * 


EE 









_ 


_ 
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STATISTICS 
u( t)e il *dtdr 



r oo 

^ r a(tY x * J _ 


CO 

CO 


fleTicc } 


taking the squ ares 


of the 


moduli, 


|&.( a )l 2 


- |a.(«)I‘|L 


a ir<x 


a(r) dz 


(47.100) 


The function 


jf 


r(a) “ |” e<r “ fl ( r )* ( 47 - 10 l) 

„„ function or transfer function. It i s ; 
is sometimes called the fluency J^ S j L„ g fu „c«ion a(r). (47.93) and (47. 100 ) 

essence, the c.f. (Fourier tra “fo™) f ser ies is obtained from that of the origin 

Show that the spectral densi^ of the^ ^ ^ modulus 0 f the transfer function. The 

series on multiplication by the square series> u(t)> mod ifi e d by some 

SSm^uiX'liar average, to give the output *)• 

47 25 Within limits, it is possible to choose the transfer function so as to produce 
from an output* (t) with emphasis on particular frequences. Such a function, or 
rattarthe set of weights u(r), is then called a. filter. This is not the happiest expression, 

alilterremoving impurities by withholding them, rather than transforming them; but it 
will serve. We need not, and shall not, confine our usage to averages which extend 
over the past, as in (47.99). Thus, the ordinary moving averages which we considered 
in Chapter 46 are filters in this sense. 

Partial autocorrelations 

47.26 If we write the linear autoregressive scheme in the form (47.58) 

u i = (47.102) 

we may regard it as a kind of predictive equation for u t , which will then depend on two 
factors, the systematic terms m u M which, as it were, express the effect on u of its 
own past history, and the random element e, which can be considered nQ n j- + K tS 
We may then ask the questions which are usuallv LTZ ; C °, nSldered as a disturbance, 
given the autocorrelations p what are the partial^orrelaf 11 ° rdmary re § ression analysis: 

of u, on previous terms when the effect of other inter expressm S the dependence 
removed? intermediate previous terms has been 

47.27 Consider first of all the Markoff scheme (47.68), 





STATIONARY TIME-SERIES 425 

wlth = P 2 > P\i = p 32 = p. 

Thus pi3z = o (47.104) 

It will easily be seen that, in fact, all the partial autocorrelations are zero. This is 
otherwise obvious from the fact that in (47.103) the regression of u t concerns only u 
as independent variable. 

In short for a Markoff scheme all partial autocorrelations vanish. The term u t , 
so far as it epends systematically on previous terms, is entirely explained by m<_i- 

47.28 Now consider the Yule scheme (47.74), 

u t = + (47.105) 

We have from (47.75) and (47.76) 

a i 

Pi = “:r-r L - 


p 2 — "“<*2 + 


1 + OCo 


The partial correlation between ut and 2 is given by 

Pl3.2 = ^4 = - ( +7 ' 106 ) 

l Pi 

as we might expect. 

We can easily check that higher-order partials vanish. For example, the numerator 
of Pi4.23 is Pu.3~Pi2.3P42.3- This in turn has a numerator 

(^■~Pl)(P3~P2Pl)~(Pl — p2pl){p2~Pl)’ (47.107) 

Considering the determinant of the first three Yule-Walker equations (47.66), we have 

Pi 1 Pi 

Pi Pi 1 = 0 . 

P3 Pi Pi 


This, expanded by the first column, shows that the expression (47.107) vanishes. 

Such results are, of course, obvious from the general theory of regression when 
we recall that the regressor variables and the regressand all have the same variance, so 
that the coefficients in the regression equation, being standardized, are equal to the 
partial correlations. 


Infinite, semi-infinite and circular processes 

47.29 If we consider the linear autoregressive scheme 

h 

2 = e t (47.108) 

o 

as generating a series of values of u, given those of e, we are faced with a difficulty, or 
rather, with the necessity for making a decision. We cannot find the value of u at 
some point, say T, without knowing those for T— 1, T— 2,..., T—h. We may sup¬ 
pose that these values are given, or otherwise known, for some T 0 . From that point 
onwards the series is ascertainable and we may say that it is semi-infinite, because it is 
considered as extending to infinity in one direction, that of increasing t. 



t heo rY oF sTAriSTICS 

the ADVANCED extending back into the infi nite 

* fwe may r <*r Welo no. .hen know the " star,!,* 

>: • ' on the other han ^ ^ in finxte t0 be infinite. 

as well as f° rvval i^ series may be _ Wnrnp dear _ 

values, if -ny. 



as wen «*=> The series w yi become clear, we m av , 

Wl “ eS ’ Bin mathen.^ cal ^"'o .hose we have used in defining 

47.3« For cert , ines analog for so me JV 

define a ^ suppose, » ** = 

coefficients (45-» ) U t +N 

U t +N+ 1 W,+1 


Ut+N+h — Ut+h ‘ . t , 

•L] e imagine physical processes which gen erate 
far as we can see, unpossib ^ faf as we can . The best that can be 

Jt f"s VS tem We shall therefore a resul ts for the circular process i n an 

sdd about it is that, should we be «bl letting TV tend to infinity, we may derive 

exact form, there is some expectat^ s y emi _ infinite case. But even this is doubt- 

at least approximations to th . banishing it to infinity. We shall therefore need 

results for .he circular process. 


47 31 The theoretical forms of correlograms exemplified in Figs. 47.4 and 47.5 
c n il vprv closelv by the observed correlograms of short series (“ short” 
rr "mSint lUng n P » Exercises 47.20-22 givp the 

observed correlations of Tables 47.1, 47.2 and 47.4. Apart from irregularities such as 
might be expected from sampling effects, there are two other phenomena encountered 
ju practice: (a) the serial correlations are biassed downwards, and (b) the correlations 
of higher order do not damp out for Yule and Markoff schemes as quickly as might be 
expected. A theoretical explanation of these effects will be given in the following 
chapter. The problem of how to fit schemes of various kinds to stationary series and 
how to test hypotheses concerning them will be considered in Chapter 50. 


EXERCISES 


47.1 In a stationary series with ft = p, = p, show that p> and that 


ftSrV-P-J. 
p + l 


47.2 For the Markoff series 
show that the cumulants of u 


“* = P“t -i + e«, 

are connected with those of £ by 

Hence show-that for tin , = * r ( e )Al -p r ). 

^ * he coefficients A r = ^ 

m«) = Ms) (lr£5i! 


Deduce that 


m genera l«is closer to 


normality ,h an Cjbmthat . t . 


£ 




is not normal unless e is normal. 
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47.3 Show that the MarfcnfF , 

n scheme of the previous exercise can be written 

u t = put+i + tjt 

with ^ , 

r i t ~~P e t+ + (1 — p 2 ) S p i et-j . 

Hence show that if r, is a random variable, ' = ° 

var rj = var e 

cov ( nt , rit+k) = 0, k # 0. 

47.4 Verify that for the spectral density function (47.73) 

f“ 

zv(<x)d<x = 71 . 

J 0 

. Verify ^at the Yule autoregressive scheme equation (47.84) is true and that the 

integral of w( a) over the range 0 to * is equal to ».. 

47.6 Show that there exist four and only four moving averages of a random series with 
correlograms 

0 = 4 8 

Pl 85’ Pa ~ 17’ Pi= _ 85’ Pi = Ps = etc - = 0. 

These are |[8, -4, 2, -1], i[-l, 2, -4, 8], |[2, -1, 8, -4], i[-4, 8, -1, 2]. 

(Wold, 1938) 

47.7 In the general autoregressive scheme with random term e, show that 

^2 CCjUt+j'j = 


var 


var e. 


47.8 For the Yule autoregressive scheme show that 

var u 1 + a 2 


var e (1 - a a ) {(1 + <x 2 ) 2 - ccj } ‘ 

47.9 Show that the autocorrelations of the with differences of a random series are given by 

tnS*) 

w “ ( - iy oS+5»- 

47.10 If any series is fitted by a Yule scheme with autocorrelations pj show that the auto¬ 
correlations of the residuals, say oj, are given by 


Oj 


_ (1 jf a? + ( 1 + thtt pjj- i + Pj- 1) + <*2 iPj f 2 + Pj- 2) 

1 + a*i + a a + 2a x (1 + otg)/^ + 2a 2 p 2 


47.11 Show that a series for which the autocorrelation function is 

v(j) = (sin Xj)/Xj 

has a continuous spectrum with a jump at the point A. 


47.12 Show that any linear autoregressive series can be represented as a combined sequence 
of Yule and Markoff series in which the error term e of one is the series-value of the next. 



the advanced theoky of statistics 


47.13 If a. B are defined a» 2 » ^ cos 

A n j=i 

D f s «y sm Pi 
n 3=1 

. M+b , wh ere it is a component uncorrelated with sin 

show tha '' ,f + 

A — sin 

” L • „ for B Hence show that S 2 = A 2 + B 2 remains small as « inr 

with a similar expression fox B. tie nc n in Crea , 

ZZs a-e is small, in which case S - a . 

4714 For a “continuous” series with autocovariance function 

p{k) = e-*W 

show that the spectral density is given by ^ ^ 

“’ (a) = na^+a 2 ' 

(Cf. the characteristic function of a Cauchy distribution.) 

47.15 A series obeys the relation 

Ut = U(—i~h £ t 

where sg is a random series with unit variance. It is divided into consecutive groups of 
terms and the arithmetic mean of each group determined, say as vt. Show that 

var (A vt) — (2m 2 +\)/3m 

and that 



YU 2 — J 

cov (An,, Ap f+1 ) = 2^5^. 


47 .16 Let U be the V x V matrix 

/ 0 1 0 0 . . . 
/ 0 0 10... 
I 0 0 0 1 ... 


(Working, I960) 


\ o o o o ' ‘ ' 1 / 

to the right, andthlt U^O.^HenJe ^ ^ the diagonal of unities displaced 

autoregressive scheme may be written lf U 1S a coIumn vector (tt,, . . . ,u N ) the 

= e. 

Further show that for the dispersion matrix of u 

v (u) = (2 ay IF)' 1 (2 ayU' 1 ')- 1 . 


(Whittle, 1951) 


47 17 cl r . tvvmtue, 

Show further that • 

inverse of the dispersion matrix of U is given by 

v, = (S«,U'l) (S «,U0 
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and hence that for a third-order linear autoregressive series with random errors the inverse 
of the autodispersxon matrix is viven 1 ™ 


~ -—* “ wmu-urucr unear < 

of the autodispersion matrix is given by 

’ 1 «’ oe 2 

a i(l+a 2 ) 


V-1 = 


«x(l + a 2 ) 1 + af + af 


a 3 . 

a 2 + ai a 3 • - etc 

a i + «i a 2 + a 2 «3 


*3 + ax + ax a 2 + a 2 a 3 H-ax + a| + a| 

etc. 


47.18 Show that although the autocorrelation matrix of a series is of the Laurent type, 
its inverse is not. (Whittle, 1951) 


47.19 Referring to equation (47.63), show, by expanding the left-hand side in terms of 
partial fractions, that S converges if the roots of (47.62) are all different and lie within the 
unit circle. 


47.20 The following are the serial correlations of the data of Table 47.1 (wheat prices). 
Draw the correlogram. 


Order of 
correlation 
k 

rjc 

k 

rjc 

k 

ric 

k 

ric 

i 

0-562 

16 

0-158 

31 

0-060 

46 

-0-036 

2 

0-103 

17 

0-109 

32 

-0-008 

47 

-0-013 

3 

-0-075 

18 

0-002 

33 

-0-039 

48 

0-042 

4 

-0-092 

19 

-0-075 

34 

0-007 

49 

0-062 

5 

-0-082 

20 

-0-062 

35 

0-056 

50 

0-065 

6 

-0-136 

21 

-0-021 

36 

0-010 

51 

0-050 

7 

-0-211 

22 

-0-062 

37 

-0-004 

52 

0009 

8 

-0-261 

23 

-0-088 

38 

-0-015 

53 

-0-027 

9 

-0-192 

24 

-0-084 

39 

-0 047 

54 

-0-053 

10 

-0-070 

25 

-0 076 

40 

-0-047 

55 

-0-073 

11 

-0-003 

26 

-0091 

41 

0-008 

56 

-0-106 

12 

-0-015 | 

27 

-0-052 

42 

0-034 

57 

-0-084 

13 

-0-012 

28 

-0-032 

43 

0-065 

58 

-0-019 

14 

0-047 ; 

29 

-0-012 

44 

0-099 

59 

0-003 

15 

0-101 

30 

0-059 

45 

0-009 

60 

0-010 


47.21 The following are the serial correlations of Table 47.2 (marriage rates). Draw the 
correlogram. 


Order of 
correlation 
k 

Tie 

k 

ric 

1 

0-563 i 

11 

-0-080 

2 

-0-089 

12 

-0-136 

3 

-0-498 

13 

-0-132 

4 

-0-631 

14 

-0-058 

5 

-0-467 

15 

-0-095 

6 

-0-025 

! 16 

-0-126 

7 

0-353 

17 

-0-036 

8 

0-396 

18 

0-131 

9 

0-254 

19 

0-209 

10 

0-104 

20 

0-205 



v 0F statistics 

.^ 0 ^°'" le 47.4 (artificial series). 

^“°" 8 ° 


Dr 


aw 


Order of 
correlate 
k 



0-70 

1 

0-29 

2 

o-oi 

3 

-0'17 

4 

-0-27 

5 

-0-25 

6 

— 0-13 

7 

007 

8 

0-12 

9 

10 

_i_ 

005 


rie 


0-05 
- 0-17 
-0-27 

— 0-31 

— 0-30 

— 0-18 
0-12 
0-29 
0-33 
0-22 


ric 


0-05 

- 0-12 

-0-28 

-0-43 

-0-57 

-0-56 

-0-26 

0-02 

017 

0-27 























CHAPTER 48 

THE sampling theory of serial correlations 

l “frrwl e d!3 the serial correlation of lag k in 45.32 and remarked to for 
• nuruoses simpler forms of definition were mathematically and computat y 
“t'Xenienl For large „ the definitions tend to equivalence. For large sample 
T rv we shall therefore consider the standard error of the form 

’ /ni ' 1 (48.1) 

= c/v y say. v 

As usual we may write parental or sample forms indifferently in the resulting expres¬ 
sions and shall usually employ the autocorrelations 

In accordance with the customary procedure we have 

c dc cdv 

OYj --r 

J V V 2 


var Tj = 


var c 2c coy ( c } v) c 2 var v 


v* 


V K 


V 


and, taking v = 1 without loss of generality, 

var Yj = var c — 2c cov (c, v) + c 2 var v. V • ) 

To evaluate this expression we will derive a general result concerning the covariance 
of two covariance terms. We have 

cov(i£tt a *W -Sw 6 « &+s+< ) = Va+s^ u b u b+s+t} ~PsPs+t 

\n n ) n 

= lrE{?.(ii a u a+s u b u b+s+t )}-p s Ps+t- (48.3) 

7l l a,b 

To evaluate the product-moments of order four in this expression we need some 
further assumptions. Assume that the u 's are jointly normally distributed so that their 
characteristic function is of the form 

exp (— + 0l +s +4- 0 jj +s+ 1 + 2p s ®«+s + etc.)}. (48.4) 

For the coefficient of 0 a 6 a+s 0 b 0 b+s+t we find 

PsPs+t^Pb -a Pb-n+t + Pb -a -Is +1 Pb -a -s’ 

On summing over a, b, we find, for the covariance (48.3), 

1 


iv 


CO CO 

n*PsPs+t + n ^ PiPi+t+n 2 Pi+s+tPi-s 

f = — oo i = — co 


PsPs+t 


Pi Pi + t~^~^ Pi-\-s+tPi —si * 

n 


(48.5) 
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432 putting * 

NowW ussP^'- 


var v 


2 00 


, __ o we have 
putting t - ’ 


var r 


i 2 (pf+^+ s 

7 / -00 




o and replacing t by s, we ha ^ 

Putting * *“ covf«,tf) = « S - PiPi+,r 


n -oo 


. „ f these values in (48.2) gives us 

Finally, substitution . U+p^PM^PlPiP^* 2 ^^ 

var /y .. . . I 


r . = i s h 

3 « i=-oo L 



(48.6) 

(48.7) 

(48.8) 

(48.9) 


The formula is due to Bartlett : (19 > )^ ^ ^ samples with the simplifying assump. 

48.2 This result shows us th , ^ Qn a]1 the autocorrelations of the series. 

tion of normality, the vanan ? £ them a n directly from a finite series. Some 

fc r arh C owet%e derived in .he manner of the following example. 


Consider in the first place the simple case when all parent autocorrelations are zero 
(a random series). We then find, from (48.9), 

varr,-= -. (48.10) 

This verifies that the variance is of order 1/n. In fact, the sampling formulae in this 
case reduce to those of an ordinary correlation coefficient in bivariate normal samples 
as is evident from the fact that the series is random. 


Example 48.2 

If Pj and subsequent p’s are small, (48.9) reduces to approximately 

-- 1 ^ -.-I O 


It 


var r i = - 2 pf 
ft “(i—i) 


may be verified (we leave this as Exercise 48.1) that on similar „ 


(48.11) 


assumptions 


C°VM, +( ) = 1 f PiPtu . (4812) 


Example 48.3 

(«.H) a e nd^4r t " ark ° ff (47.68) with 

Var r i =- l S o si \ 


parameter p, we have from 


n _ 


oo 


nr, jiif' 


Exercise 48.8. 


(48.13) 
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Example 48.4 

The approximate forms of (48.11) and (48.12) can be derived direct fiom the auto 
correlation generating function. For, in the notation of 47.14, 


and hence 


w(A) = G(z) = 2 pi z l , 


{G(z )} 2 = 2 2 PiPi . a = 2 (2 Pi p* +i ), 


(48.14) 


a = — oo i 


so that G 2 is a generating function for the sums required. 

For example, with the Markoff process 

G(z) = -1+—!—+ r— 

V ' 1 —pZ 1 —pz 1 

G 2 (z) = 1 + (1 - pz)~* + (1 -p*-i)- 2 - 2(1 - pz)- 1 - 2(1 - pz- 1 )- 1 + 2(1 - P z) (1 - pz- 1 )- 
We then find for the coefficient of z° 

1 + 1 + 1 —2 —2 + 2(1 + p 2 + p 4 +...) = 

as at (48.13). Also the coefficient of z k is given by 

(k + l) p k -2p k + 2(p k +p k ^+ ...) = + 

Hence the covariance of r } - and r j+k in a Markoff scheme is, by (48.12), 

(48 ' 15) 


and the correlation between them is, using (48.13), 

p*{(fe + l) + (fe-l)p 2 } (48.16) 

1+P 2 

The method may be extended to schemes of higher order (cf. Quenouille (1947a) and 
Exercise 48.14). 

Bias in the estimation of autocorrelations 

48.3 If r is of the form A/^(BC), we have, writing a, b, c for deviations from means, 

E(A) + a 

= [{£(-#) + b) {£(C) + c}P 

and expanding in binomial series to the second order of approximation, we find 


E(A) 


E(ab) 


E(ac) 


E(bc) 


" {E(B)E(E)y*{ 2E(A)E(B) 2E(A)E(C) + 4E(B)E(C) 

, 1W) 3 E(C>) ^ \ 



Putting 


B = 


+ 8£ 2 (£) + 8£ 2 (C) + ’“ 

1 n-j 1 (n—j 

s «?- ' v 


n-j i-i 




n-} \ 2 

L W„- 


1 n ~3 „ 

C = --. s Ui +j - 


n-ji~i 3 (n-;) 2 V=i 


n—j \ 2 

^ ^i+j ) 


(48.17) 

(48.18) 

(48.19) 
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, hE APVA^ faction, using «he a sym p,o ti c c qui 

_ liHgM- lH Id,'" 13 "* *. M warb 

I ^ * i§'js* ~«- l Then we have 

■ *■***' „f the series be *** 

t the varisnee f i8 . 19 ), j f ,J s 

,ectations * n ^ g(C) 55 v | v 1 



< 4 8-20) 
> on takir 


\v-i)Pi 


m g 

(48.21) 


1 


/ n-3 

ft j v 1 „/ 

2 Ui U i +i 

1 S ‘- 1 

^ " 11 -J i ^ 1 


(48.22) 


1 / 1 ■ 


b'; 1 , *\ . — — S (^ fyPj—i 

S (v-JjPj+L Vi==0 
v »-o 


y<0 ' (48 ' 23) 

a ms‘s\ var b and cov (a, *)■ Substituting for the various 
We have evaluated in ( 4S j> shall not write this out explicitly, but shall 

quantities in (48.20) we find W 
consider some particular cases. 

^TtLfeL is random, * = 0 except for * = 1. We then find from (48.20) 

m *-;■ (48.24) 

It so happens that in this case we can evaluate the bias exactly by using the definitii 

11—j 


ion 


= 




n-j 


Put 

Then 




# = IL- - U. 


E( rj ) = JL E ?*<*w 

V *.3 


'j 2 *? 


nE z J^i+i 


eIl^ 


= n 

*(*-!)" "T^~> ***, 

= -L sf 

“- 1 2^ 


«-l ’ 


(48.25) 
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_ x Tt is ra ther remarkable that 

since = 0. This agrees with (48.24) to ordei n . 
there is a downward bias in r, even for a random series. 


Example 48.6 

If the series is such that 


we find from (48.20) 


Px = P> Pi = °» J 96 °’ 1 * 
E(r x ) = / > + ^(l+p)(4p a -2p-l) 
£(r a ) = -i(l+2 / o + 2p 2 ) 

£(r,) = --(l+2/o), ;'<2. 


Example 48.7 

For the Markoff series (47.68) with parameter p we find similarly 

\ 1+3 p 

£( r 0 = P -— 


(48.26) 

(48.27) 

(48.28) 


(48.29) 

(48.30) 


The bias in all these cases is downwards and obviously may be quite serious. For 
) p = 1 in a Markoff series of 25 terms the mean value of r, would be about 0-4, not 0-b. 


Quenouille’s correction 

48.4 In the manner of (40.28) we may use the simplification of Quenouille’s 
method of removing bias by splitting the series into two. If r is the serial coefficient 
for the whole series, and r^) those for the two halves, we use 

R = 2r-i{r w + r (2) ), (48.31) 

which will be unbiassed to order n~ x . 

For some further results on bias see Marriott and Pope (1954), Kendall (1954), and 
Quenouille (1956) (cf. 17 . 10 , Vol. 2). White (1961) has obtained some results for the 
Markoff case to order n~ 3 . 


Some exact results 

48.5 We now proceed to consider some exact results in the distribution theory of 
serial correlations. As we might expect, exactitude, has to be purchased at a price, 
usually that of assuming normality in the parent series, but occasionally, also, of simpli¬ 
fying the definition of the statistic under investigation. We first of all derive some 
results (due to Moran) by the method of expectations. We then obtain some distribu¬ 
tions (due to R. L. Anderson and a series of later writers, notably Daniels) which 
raise some quite new points in distribution theory. 



THE AE>vA 

f 


'1 = ^1 
ur 95) that 


THEORY of statistics 
normality with zero autocorrelation 


We have 

already shown at (48.25) 

E(n) = - 

Put 

_ n - 


I = — 

i 

so that 

E(I) = 

We have 

/»-i \ 2 


( 4 8 . 33 ) 


-Y\> 


Xi = u.-u, 


E(P) = E l—j 


2 %i+l 


= E (2 2 ^ •S'l+i •S'i +2 ^ z i +i %k +i(48.34) 

where i, f+1, k> k-\-\ are all distinct. Thus 

E(P) = £[(2 zfj- 2 {{n-\)z\zl+2{n-2)z\z 2 z z + {n-2){n-Z)z x z z z z z^] 
or, in terms of the augmented symmetric functions (12.5, Vol. 1), 

m . E [ RJ- g [21 m +[1 *]}] • (48.35) 

Using Appendix Table 10, we express the augmented symmetric functions in terms of 
power-sums, obtaining, since (1) = 0 here, 

E(P) = tfTifl-ill!/?W_il + _1_A 6(4))] 

L#l (2 ) 2 J »(«-l)\(2) 2 J + »(«-l)\ (2) 2 jJ* ( 48,36 ) 

Now in normal samples kjk | is independent of £f (cf. 37.27) and, using (12.28), 

^4 _ n — 1 f U) \ 

«"(lT^r3j(* ( " +1 )^i-3(«-l)). (48.37) 

(48.37), *** e sam ^ e ls norma l. the expectation of kjk f vanishes, giving, from 


( 48 . 34 ) 


(48.35) 


(48.38) 


and hence 


ion men gives 

E(P) = w2 ~3n + 3 
n\ n ~ If’ 

var / = _( w ^2) 2 
« 2 («-l) 
var >-j = ( w ~2) 2 
(n-W 


(48.39) 


(48.40) 
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It may be noted that, to order n 


-i 


^ ( 


(48.41) 

var ri = ^+1' 

the circular definition and also derives ^ 


V; 


Moran (1947-8) gives results for the circular definition _ has used similar 

and fourth moments—cf. Exercises 48.6, 48.12. Jenuns aq 26 and Exercises 48.18 
methods for the joint distribution of serial correlations (c 
and 48.19). 


R. L. Anderson’s distribution mnsider the 

48.7 For reasons which will become evident later, we 


distribution of the first circular serial correlation 


= 


U\ + ^2 ^3"b ■ » ■ 


il ^ 


(48.42) 


S (Ui-uy 

Following R. L. Anderson (1942) we consider the distribution of this statistic in s p 
from an independent normal series. We now drop the suffix to r. that r 

We shall seek a linear transformation to variables £* (i = 1> 2, • • • > J 1 ) S ^ IC . 
transforms to S ^If/S If. The point about using a circular definition is tha 

can determine the 2’s explicitly. . , 

Any orthogonal transformation will transform the denominator o r to t e requ 

form. The numerator of r is equal to q, say, with 

(48.43) 


i u 


'i 


k>\. 


q = 2 Mftti+i ^1——-S «? —-S 
a 1 l+ \ nj n n 

Consider //lo 

? -2 2«?. (48.44) 

As in the case of principal components (cf. 43.6), if we determine 2 so as to maximize 
this quantity, we shall arrive at the sum 2 as desired. We do not need to find the 
transformation: the 2’s are all that we require, and from (48.43-4) they are the roots of 


1 

n 


-H) iH> 

K‘-I) >3 iO-3 - 
H 1_ ») _ H)'" 


1 


1 

« 

1 

n 

1 

n 




1 

n 

1 

tt 


= 0. 


(48.45) 


Mi-?) - l - -i •• 

2 \ n) n n 

This is a circulant determinant and can be factorized. Consider, in fact, the circulant 


!(-3 ~K) 


D = 


Cl 2 

"ft ^1 


«1 
d n 

d jl -1 ^ft 


a, 


«ft-i 

^ft-2 






(48.46) 
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of unity. Multiply the /th c 

that S is a factor of £‘> Qf 
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/_ n be the n roots of unity. _ 
... co t, «»(- ^ it wi n be seen that £ «,<*>* 
sum for J- 



'Sam 


D 




Now 


and thus 


W 

£ ®i' 

j = l 
n-1 


n ^ x 

_= XT £ • 

i=iy=i 

-i = 0, k ^ n, 

: = », 


(48, 


•47) 


= n, 

n-3, k = n, 


£ <y * _1 

= — (1 + ft)* + tU* *)> ^ 5^ n ‘ 


Putting the appropriate values of the «’s, from (48.45), m (48.47), and using ( 48> J 
wc find, on some reduction, 

z> = /n {-A+iK+^ 71 )}^ 


and, since ifty+a)# *) = cos-, we have 

§L 


D 


n -y . 2nk\ 
= 7 IT ( —7 + cos-}. 

*-i\ « / 


(48.4c 

Ink 

Thus the roots are 7 = 0, 7 = cos-, A = 1, 2, . . . , n — 1 Tt : c „ 

n * u 1S crucial t 

observe that the 7-roots occur in pairs, except perhaps for one which is unitv Tb 

paired terms may be put together to give v j (= sum of squares of two £*\ 

have b /* 1 hus y 

K*-i) 

? = ( 2 ** « odd. J( 

§(«—2) 

= i? ^i v i~ v > n even, . 

where y, is distributed as ^ with 2 d.fr. and » with one H f r mu . . ^ 

sa y Py * s distributed as y 2 with /i—ldfr n ui . The denominator of 
distribution of * 1 dA 0ur P robIe ™ is then reduced to finding tl 

Hn-l) 

r = .2 i.VSt-., „ oddj 

= (Sltt'r-n)/(S^ +l , )) Beven 

the foregoing^due bU w nal .?, r ? 1 bIem Was solved by R L A d 

,lfc ”"” 5 - 

r = ^2 — ^ 

where *t+*+'iT’ P* = ». + »,+», 

= cos — _ 1 , i 

6 O ’ ^2 = —- 

z 2* 
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'• ’ *f 


(48.56) 

(48.57) 


lii 


> 
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The joint distribution of the v’s is given by 

dF(v lt v 2 ,v) = -L-v-le-* v e-l v 'e-* v 'dv dv 1 dv i 
4(2 ny 

= —-— -v~ i e~ iv ‘dvdv 1 dv 2 . 

4(2 ;»)* 

We have to consider two cases, 

^2 ^ T ^ Ai 

and — 1 < r < A 2 . 

Note from (48.52-3) that r^A^ the larger of the roots in A. Consider the first case 
From (48.54) we have 

v 2 = iPa (Ai — r) — ^(1 + A])} 


Vi = 


^1 ^*2 

1 

^1 ^*2 


{j>6 ( r— A a ) + ^(1 + A 2 )} ■ 


(48.58) 

(48.59) 




The Jacobian of the transformation will be found to be 

d(tt],^ 2 ) _ ps 

d(j>6,r) Ai-Ag* 

Hence, from (48.57) the distribution of p 6f r and v is given by 

iF =J^^P?S^d h drdv. 


(48.60) 


4 (2nf A x —A 2 

Integrating out for v y noting the limits determined by (48.58) and (48.59), we have 


dF = 

4(27r) a Ai.-A 2 |_ Jo 

7 /> 3 « 2 (A x - r)= d?p 6 


(48.61) 


(48.62) 


i 


4* 


i 


l 

“ 2{2nf (A x — A 2 ) (1 + Aj 

Finally, integrating for p 6 from 0 to oo, we obtain 

W)-l 

For the other part of the range we find similarly 

(48 ' 63) 

48.9 It is typical of these distributions that they split into separate analytical 
expressions for values of r between the critical roots A.. The frequency curves are 
continuous, but the derivatives not necessarily so. It is also typical that the distribution 
functions are easily written down. For example, corresponding to (48.62) and (48.63) 
we have 

Prob (R>r) = 


(A]_ — A 2 )(l +A x ) 2 
(K-rf 


A»<r<A 


i > ^ 


{h-rf 


(A t —A 2 )(l+2^“ (A 2 -A 1 )(l +A 2 ) 5 ’ 




(48.64) 

(48.65) 


FF 
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440 al form 1 e 5 j ^ ^ A m> n odd, 

■ t qU o.e ^ t** S - 4+1 



Pro b “ i?> "n”. 

^ILdiZ* 


(48. 


■ 66 ) 


-, A, n +i^ r ^m> n even. 


(48.f 


The frequency to*** 


~£' li: fi sr ck-Xi)( 1+ *‘ )1 ' 5 

. - fSs So K--1) “ «"- 2) PieCeS aCCOrding t0 Pa.it, ! 


of #• 


rather cursorily a number of features 0 f t k 
48,0 We shall have .0 P® ^ statistical interest, 

distribution which are of ® circulant has factors typified by 4 - cos ( 2 *ft /n) 

(a) For r, with/not equal same as before and the distribution ren^ 

( If Ha prime » * f seeffl that the analytical form is different. 

unchanged. If not, it wou irculant without assuming a circular definition 

(b) Thereareo.herwaysofob.am,ng ^ # ^ 

of the coe cien a v iy+a„ + i« ! »vi+ • • • + u n-i u s 

—- 2u*^ 

will be found to have the characteristic property of paired roots in . A, and therefore 
follows Anderson’s distribution. Other cases are given by Durbin and Watson 

(1950-1). 


48.11 We proceed to derive the characteristic function of q and p. Taking 

•f j d_1_J__ Trm T» otta fnt* fnA ininf o f /v _ _ 1 . . % 


temporarily s, t as the dummy variables, we have for the joint c.f. of q and p } the 
numerator and denominator of r, 


where 


cf>(s,t) ccjexp [- \ (2 u 2 - lit 2 (u -tif- 2is(u x w 2 + ... +u n u 1 -nii 2 )}] du 

(48.68) 


= A-» 


A = 


i+* ... *- 

\ n J n n 


. , 2 
w + — 
n n 


it .2 

— w+_ 

n n 


i+* 


(48.69) 


, „ l-2ft(l-l)+“ 

This, the same kind of circulant which we had at (48.45) and reduces as at (48.49) to 

!,{'-* <««??)). 


r*ng logarithms and identify! 


n L 


yin S coefficients we get for tv, . r . 

k - *i • ”~ 1 2 l+5 -i /. ° ’ tor cumulants of q and p, 


v V = ilj\X^(i+j\( 2 nk\* 

*=1 t+j l j ) I COS 


2, r \2 (cos 2nk ^ 


n 


n 


r = i+j-i. 


(48.71) 

(48.72) 
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2 cos — = -1 = S cos 3 ^ 
n n 


Scos 6 — = etc. 
n 


(48.73) 

(48.74) 


2), Sco.*^-l(3«-8). 

ft W 

Substituting, we find 

^ = K “ 2t ' (48.75) 

^30 = — 8; k 21 = 4(n — 2); «i 2 — 8, k 03 order cumulants are 

It will be seen that k 20 and «r 02 are of order «, ^ he ^ e . a * * 8 _ , cumulants tend to 
of no higher order. Thus, in standard measure, the g ® ality Moreover k u in 
zero, and the distribution accordingly tends to bivanat , y ' , henC e, through 

standard measure tends to zero, so that q and p are uncorr 
normality, independent in the limit. , , 

In fact, r and p are independent for normal variation an 

E(r m p m ) = E(r m )E(p m ) = E(q m ) 

E(rm) = *(£). 

{ ’ E(p m ) 

We can then evaluate the moments of r from (48.75), finding 

Vi ( r ) = 


(48.76) 


n — 1 


, x _ _rc( ”~ 3 ) 

- (M+l)(«-l) a 


2(2n-l)(n-6) 

P* (ra-1) 3 («+!)(«+ 3)' 


(48.77) 

(48.78) 


The mean agrees with the exact non-circular result at (48.25), but the variance is only 
the same to order n~ x as the non-circular variance (48.40). 

48 12 Dixon (1944) obtained an approximate form for Anderson’s distribution by 
an ingenious smoothing of the characteristic function (48.68-70). Write temporarily 
6 « o_ L /„ (48.79) 


a = 1—2 it, ft = -2 is, 6 lc = 2nk/n. 
We then have approximately 

<f>(s, t) oc n (a+/5 cos 0 /c )-* 


k =l 

= (a+j?)* exp 


n 


-A log S (a + /3 cos 0 lc ) 

Jc=l 

n 2n n 
4 n n h= 

*2 71 


= (a 4-ft* exp — S log (a + ^ cos 0 fc )~| 

v |_ 4 tt « &=i J 


w f 2jI 

= (a+j5)lexp - 7 — log (oc + jScos 0) dd 

L ™ Jo 


= (a+ft*exp [-£»log {i(a + (ot 2 —/3 2 )^)}] 
= 2* n (a+ft* {a + (a 2 —jS 2 )*} - * n . 


(48.80) 
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_ noW obtain the moments of q and * T 

By successive ff^^'megrating for /» we can find ™ tS ° f *e for m > ; b > 

^ndthennexpectedly sitnple forms ^ 



If—1 

,_1__ 

"" » + 1 


= “ _ 


(»-!)(«+3) 

J_ 

“ (7T+I)(« + 3) * 


1) 

(48.82) 

( 4 8.83) 

(48.84) 


• u /n.Vnn 1944 ) that these moments are, in fact, exact. The 
mb (WK those of the distribution 


Y{\n+\) i/i _ r 2 \j(«- 2 ) 

intwixi) 


(48.85) 


Thus, from (16.62), the squared serial correlation has the same distribution, approxim¬ 
ately, as the squared ordinary correlation coefficient in samples of n+2 from an un- 
correlated normal population. 

Dixon (1944) also treats the case when the series has known (zero) mean and the 
case of a coefficient of lag / > 1. The same result holds, with n +1 replacing n when 
the mean is known, up to (n/2m) moments, where m is the largest common factor of 
/ and n. Cf. Exercise 48.20. 

48.13 Koopmans (1942) reached the same result by a different route. He expressed 
the c.f. of p and ?asa contour integral and smoothed the values of X, as above by spread¬ 
ing them uniformly round a circle before integrating. This led him to the expression 
(}n-l)2> n f arccosr 

(cos oc-?•)!«—« sin £«ct sin a da. ( 48 . 86 ) 


/*(*) = 


Rubin (1945) evaluated the integral by showing, with the aid of a partial integration, 


dr 


; h 0'> n ) = ~ nrh(r, n-2) 


and proceeding by induction 

AVe had better pause at th" * 

results are explicit, but troublesome to P ^ 1 ! lt > and review progress. Our large-sample 
senes analysis are rare outside th * a and ln an Y case, large samples in time- 

samples we have obtained exact result 0 * 13111 ° P ^ s ^ cs an< 4 meteorology. For small 

“ * rule ’ wish *° apply tests to a serie/°\ a | rand ° m Series i but a g ain > we should n °*’ 
was not random. Moreover, our *7 ^ Satisfied by simpler tests that it 

very ini •* ™ 0st P art ’ apply to circularl^A « e P end on the assumption of normality 
^ illuminating and provide a nu “ b "of COefficien *- Nevertheless they are 

number of interesting further problems. 
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The Madow-Leipnik distribution 

48.15 If the statistics t lf t 2 , . . . , t k are sufficient for 0 lt 0 2 , . • • » we ^ ave > * n an 


*/< ; 


obvious notation, 
and hence 


^(«1, • • • , Un I e) = Q(t I 6)R(u) 


§< 

¥ 

I 

? 


(48.87) 


^ | o) = P( , |e.)§/£. 

Madow (1945) used this result to derive the approximate distribution of the serial 
correlations for the non-null case from those for the null case 6 = 0 o . 

Suppose that u lt u 2 , ■ ■ . , u n have a joint normal distribution, with mean p, of the 
following type: 

log L = constant -\ \a S (u i — u) z + 2B E (« i — p){u i+j —p) • (48.88) 

L <-i *=i J 

As before, we assume a circular definition of r, with q and p as its numerator and 
denominator. Then u, p and q are sufficient for p, A and B. We take as null the 
Anderson case with A = 1, B = 0 and hence find, from (48.87), 

p -lUp+2Brp) 

P(r,p \ A, B) cc P(r,p | 1,0) — - 

and since r and p are independent, with p a multiple of a % 2 {n — 1) variable this is 

OC pl(n-3) e -}p(A + 2£r) j # (48.89) 

We integrate out from p = 0 to oo to get 


P(r\A,B) cr 


1 


kifn-fi f 


(48.90) 


(\A + Br)U *-« 

Thus the non-null distribution differs from the null distribution in having the factor 
QA + Br)*^ 1 * adjoined to it. If it is known that the u’s have zero mean, (48.90) is 
modified so as to have \n as the exponent in the denominator. 

48.16 In particular, let u obey the Markoff relation 

U t — P u t—l~¥ £ f 

We know (cf. (47.70)) that 

1 


var u = 


1 ~P 


var a 


and 


cov ( u t , u l+1 ) = p var u = var e. 


1 -P 2 

Then the distribution of the e’s has likelihood given by 

log L = constant- 

l 

1 


1 n 

■ — 

2 a 2 i =i 


= constant- — S (u^-pw^) 2 
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, Approximately this is equivalent* 


Where o’ = var, - £ 2 u t u M . 

log£ = «’ nst -'2? t a 

Hence, in (48.88) we have 


A = 


^ - -A, 



(48.93) 


c 2 

IA+Br oc (1 - 2pr+p 2 ). ^ 

i w«») ""*«*! i 

= r(|»+i)r(i) (i-2pr+p 2 )’” 

4817 This remarkable form has been studied by Leipnik (1947), Quenouille (1948) 
Jenkins (1956) and Kendall (1957). Its moments are not nearly so easy to obtain by 
straightforward integration as might be expected. 

For the moment of order k about the ongin, writing C for a constant, we have 

U (l ^ r 2\Hn-l) r Jc 

S -—— ar 

(l+p‘-2pr)*' 


1 d/l ' t _ 


‘-'i: 


■h-l 


dF- 


,_ji (1 +p 2 ) —(1 +p 2 —2pr) 


n dp J (1 +p 2 -2pr) J _i 2p(l+p 2 —2pr) 
_ 1 f (l+p 2 )-(l+p 2 -2pr) k _ 1+p 2 

2 J l+p 2 —2pr 2p 

1/1 / 1+p 2 f (p —r)r fc_1 , 

— 9^-1 + —— — v~~n~dF } 

Z 

whence we find 




r 7c dF 1 , 

o—+o-A*fc 


l+p 2 — 2pr 2p‘ 


( 


2 p J l+p 2 —2pr 

ll+IVs-fl+p' d 


31\ , /1+p 2 d 1\ , 

v .5p + 2pJ^ ("2^r^ + 2 / )^- 1 - ( 48 - 94 > 

It will then be evident by induction that ^ is a polynomial of order & in p. Moreover, 

even-order moments contain only even powers of p, and odd-order contain only odd- 
order powers of p. J 




i ^ 


I 


I 


^ tycmP in > 

Differentiating (48.94) « times and putting p = 0 we find 

l r 

2+2m \' m + iK-i, «+i + (w+ m -1)^,^ w _j|. 
0 ~ 1> «00 = 1> 


a km —‘ 


We then find, since 


a u — 


n 

n+2 


a- np 


n+2 ' 


(48.95) 


(48.96) 


(48.97) 


4 


giving 
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Successive applications of ( 48 . 96 ) then yield 

1 «(«+!) „ 


** ” « + 2 + (n + 2)(» + 4)' 

3 np L «( « + !) 


/«3 = 


(» + 2)(« + 4) + (« + 4)(n + 6) r 

3 . 6m(«+1) 


(48.98) 

(48.99) 


»(« + !) (« + 3) ^4 (48.100) 


j Otiytl’T 1 ) 2 1 * M / 

*** = («+2)(n+4) + («+2)(«+4y(^+ 6) P + (»+4) (» + 6) (« + 8 ) 

and so on. In particular, for the moments about the mean 


i«2 = 


_J_” (”~ 2 ) c 2 „ l ~P'‘ 

» + 2 (n + 2) 2 (n + 4y « 


. 2w(«-2)(3« -2) 

^ “ (» + 2)*(»4^ + (« + 2)*(» + 4)(» + 6r 


— 6 np 


-6p(l-p‘) 

/ 1 2 


72" 


^4 


3(1 —p 2 ) 2 


(48.101) 

(48.102) 

(48.103) 


72 " 


We note in particular that, in standard measure, tends to zero and ^4 te 
illustrating the tendency of the distribution to normality. , f u 

The distribution, it must be remembered, refers to the case whe 
is known (and can therefore be assumed to be zero). 


White (1957) and Leipnik (1958) have derived expressions for the moments m terms 
of polynomials of the Gegenbauer type; Leipnik derives an expressio 
acteristic function and shows that it tends to the normal form. 


48.18 The approximate form of the variance given in (48.101) 

1-P 2 

var r =- 

n 

(which, we note in passing, is not the same form as for a product-moment coefficient) 
suggests the normalizing transformation 

r = sin z, p = sin £. 

This was tried by Jenkins (1954), who found for x = z —£ 


/hM- 2(1 -p*)*» +0(# ] 

(48.104) 

^ X )~n 2(1 -p 2 )'m 2+ °^ ^ 

(48.105) 


(48.106) 

y, = + 

(48.107) 


For moderate values of p this may be adequate, but evidently breaks down near p — 1. 
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Daniels’ approximations derivation of sampling distributions of ^ , 

48.19 A major advancem the d ^ ^ sadd]ep 

oint method c ° r ' 

relations was made y time> For a detailed account of those met 7 S int o 

sfatistics ^tema^cally on Afeffo* of Mathematical Phy sics (( J* 

19^6)^ Briefly^ they amount to this: in the complex plane the integral of a„» 
function around a contour which contains no.singularities ts zero; thus one pa > 
integration can be deformed into another, provided that no singularity is crossed Tk ^ 
method looks for a path which runs through a saddlepoint of the surface, the pre SUl !, 6 
tion being that there the function falls away most steeply from its maximum, and he P ' 
that, in the neighbourhood of the saddlepoint, the values of the function being 
grated are most highly concentrated. The path then gives us the steepest desc ^ 
from a maximum value, and an expansion round the point will give us a good approxirn^ 
tion to the integral required. a ~ 

In applications the method simplifies where means or ratios of mean quantities 
concerned. are 

48.20 As before, let r = q/p. Let M(d u 0 a ) be the moment-generating f„rw 
of p and q, i.e. s unction 

M(d u d 2 ) = J e 9 'P+°*dF. 

In terms of the Fourier inversion we have 

/(A?) = (2^p JJ 

the integration being along the imaginary axes of 0 2 , 0 2 . In particular, 

^ p,rp ' > = ~ rB * W-* de 3 de 2 , 

or any 

\j(p,rpy*dp=^ i fM(0 3 + r0 2A)d O z 
so that, when differentiation is permissible 

r co } 

PfiP,rp)e 6 ^dp = f W(0 3 -r0 2 ,0 o ) 

and, since the expression on the left with 0, JgTi 7^ ' ^ 

*) . r ^-rd^J UCnCydlStnbUtIOn0fr ’ WehaVI 

2 ®J df 3 L (48.112; 


(48.108) 


(48.109) 


(48.110) 


Thisisequivalentto Geary’sresultof (11 78 t , ^» =0 S ' ( 48 '‘ 

transformation rsuTh*" 8;=^ ^ in Ex6ra ' Se 11 ' 

u r s I f (48. 112)beco mi 

2w‘ J W, [ - r K o ,)} f? ' 


24. 

other form. If the 
es 




(48.113) 
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48.21 Consider first of all a coefficient circularly defined, with known (zero) 
and unit variance, from a circular Markoff process. The joint distribution o e 
u ’s is then (cf. (48.91)) 

| dF =^ eXP R { (1 + R,“‘to, . ■ • to. (48.114) 

"■it 

with, of course, u x = u n . 

The m.g.f. (compare (48.69)) is 

M(6 v 0 2 ) = (1—p”)A-- (48.115) 

where 

1 +p 2 —20 x -( P + 0 2 ) 0 ... -(p + 0 2 ) 

^ _ — (p + 0 2 ) 1+p 2 — 20 x —(p + 02) ... 0 (48.116' 


(48.116) 


| — (p + 0 2 ) 0 

This is a circulant which reduces to 


\+p^-2d 1 


n— 1 f 2.7lk 

A = (1 +p 2 —20j —2(p + 0 2 )} II -j 1 + p 2 — 26 1 — 2(p + 6 z ) cos —— 

JC = 1 l W 


L+e!zggi = ,+1. 


(48.117) 


Then A reduces to 


a = ie±Ml n f z 2_ 
** \ 

_ (p+fl 2 )”(i-Q a 


0 2nk 1 

2sr cos-1-1 

n 


(48.118) 


Hence 


Then 


0 2 = ~p + 


(48.119) 

(48.120) 

(48.121) 


(48.122) 


% ,, a 0 ... l-p n (1-2 rz+z 2 y n , AO 1im 

M(0 3 rfl 2 ,0 2 ) - 1 _^^ 1 _ 2/)r _ t _ p2 _ 203 y| M . (48.119) 

#(1 — 2pr + p 2 — 26 a ) /a 0 ioa\ 

where 02 = _/, + “ “1^2^+^-* (48.120) 

Thrn M _ (l-p M )(l-^ 2 )(l-2^ + s: 2 )l M - 2 

Then (ifA (i -*^(i - (48 - 121) 

and (48.113) becomes 

... (» — 2)(1— p w ) f (1 -s 2 )(l — 2r£ + s 2 )* n - 2 . .. 

- 2^F W)» J —*• ( 48 - 122 ) 

There remains to determine the path of integration. Consider the pair of trans¬ 
formations which together compose (48.117): 

f - r=1 2 iSr ( 48 - 123 ) 

The region | z | < 1 will be seen to be mapped on the whole ft ..-plane cut along the real 
axis exterior to the interval 

(1+p) 2 (W) a 

2(l + r)’ 2(1 —r)’ 
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118 . fl . om r-«'«>‘l’ rou S h ‘ hega r m ? erealaxis to t . i . 

« r=cos *- A* 

corresponds to a pa |22) is therefore o t is . s ,,„_ 2 - , * 

oath of integration for (4 • ' 2 a«»-s = {1-r -(* r ) / . “ e ^egra^ 

P Consider the Maximum) where * - r, a real point. F Urthe ^ * 

tin <n 7 ) It has a saddlepoint [i is perpendicular to the real axis ( a ' he 

^ of integration required is the ^ 

joining «"*. e ' i - t Let Us now neglect the factor p" in (48.122) and ] 

So far the results «e «®Ziegtand in (48.122). We then have 

in the denominator of the mteg ' 


h(r) 


n-2 ( n-z 2 )(l—2rz+z 2 ) in z dz. 


(48.124) 


(48.125) 


27ii(l-2pr+p 2 y 

x = r + iw^-r*)*. 

Put 

Then and we have , . 

u V («-2)(l-r 2 )* (M _ f (t + to i )(l-w a ) in ~ 2 dv? 

J-i 

m» + l)( l-r , p- 1) 

which is the Madow-Leipnik distribution (48.93). 

48.22 If we are not content to neglect the factor 1 —z n in (48.122) the integral 
cannot be evaluated in closed form. If, however, we expand it in powers of z n we 
obtain on integration the series 

h(r) - r( in+W- f) r(l-r a ) Kw - 1) _ 3 * _ 

{ J _ rt*(l—2pr+p 2 ) in \ r(i«+i) 2»r(f« + i)<fr» k 


4 


d 2 ‘ 


2 “r(i+i) 


r 2Vl(5n-l) 


- (1 — r 2 ) 


(48.126) 


Daniels (1956), to whom this result is due, has obtained an upper bound to the error 
involved in approximating to (48.122) by its first term. The error is, in fact small 
near p = 0 but not near p = 0-5 for n of the order of 20. 


48.23 By the use of this method Daniels obtained a number of further resi 
which we quote without the detailed derivation. 

correlation coefficient^’ ^ * ClfCUlar P rocess Wlth unknown mean and a circular se: 


h{r) = _ (”~ 3 )(1 ~P W ) 


2ni(\ —p)(l ~2pr+p 2 y( n -i) 

Again ignoring the factor *■ we have 

h(r) = ~4)(1- r 2 f n ~ i 


( W>(1-^ 2 )(1- 2rz + * 2 )J(n- 6 ) 
1 —z n 


2 ^ s ^ 5KWKT^2^^ ) | 1 ~ r - - (1 + r) 1. 


(48. j 


(48. 
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It is to be noted that we can derive the moments of this distribution from those o e 
Madow-Leipnik distribution (48.93). . 

An alternative form may be derived by the following considerations: the tina e 
in (48.128) is relatively of order « -1 . If we replace r by p in it, the result remains tru ® 
to order n~K We may therefore remove it, but the constants in (48.128) then nee 
revaluation to ensure that the integral of h(r) over the variate range is unity. e 


arrive at 


h{r) = 


r(i« + |) (l_ r )(l_r>)**-i 
27t*T(£n) '(1 —2pr+p 2 ) i(H ~ 1) 


(1 + 0(« _3/2 )}« 


(48.129) 


48.24 For the Markoff case and a non-circular process with known mean const er 
the non-circular statistic, defined by 

u 1 u 2 + ... _ (48.130) 


r = 


Daniels finds 


+ . . . +Un--L + \U 


,2* 


M!(hr¥l(HO(,i) (48.ni) 

A(r)_ 2S r(jS=2j (i-p^a-V+p 2 )*”- 1 

or an equivalent approximation 

hM - rp+l) ;i+n( „- 3 /s )} (48.132) 

{ ’ “ srr(iAT+2) (1-2^+p 2 )* 1 ' 1 ' 

where 

v_,_l + T £L. (48.133) 


\-p‘ 


48.25 For the non-circular Markoff process with unknown mean, using r defined 
by (48.130) and N by (48.133) we have 

r (|V+|)-0=^=^ p + 0(,-^}. (48.134) 


A(r) - 2^r(iV){V(l-p)-(l+/>)} (1 —2pr+p 2 ) 4(Ar_1) 


48.26 It is possible to take these results a good deal further. Daniels (1956) deals 
with the general autoregressive process circularly defined. 

In cases of higher order than the Markoff process it is also of some interest to con¬ 
sider the joint distribution of two or more serial correlations and of the partial correla¬ 
tions. Quenouille (1949b) was the first to do so. For some later work see Jenkins 
(1954, 1956), Watson (1956) and Daniels (1956). To save space we shall not enter 
into a detailed discussion of the results. Except in the Markoff case it appears that 
only circularly defined statistics and processes are reasonably tractable. Daniels’ 
method can be extended to the non-circular case, but apparently nobody has yet had 
the stamina to embark on the labour involved. 

48.27 A final word may be added concerning distributions when the residual term e in 
an autoregressive scheme is not normal. The matter has not received much attention 
from statisticians. Quenouille (1948) did some sampling experiments, in particular with 
rectangularly distributed e, and came to the conclusion that approximate normal theory 
provides satisfactory tests of serial correlations, at least for moderate or large samples. 
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exercises 


48.1 Verify equation (48.12). 

48 . 2 a Yule scheme giv» by „ 

show that approximately yar ^ = 2-44/#. 

48.3 For a series in which ps = 0, s^* 4» show that 

E ( n ) = —. (1 +2pi + 2 p 2 + 2 p 3 ), j > 3 . 
n—j 

48.4 A sample correlation coefficient is defined circularly as 

n 


(Bartlett, i 946) 


Show that in a Markoff scheme 


£ iiitii+i — nu 

i —1__ 

£ m, 2 -ww 2 

i=l 


,12 


E(ri) = P- 


1+4 p 
n 


(Kendall, 1954) 

48.5 For the circular definition of the previous exercise show that, in samples from a nnrm i 
random series, normal 

var (,J = 


(n + l)(« —l) 3 ' 

48.6 In continuation of 48.6 show that 

and that, for the circular definition, M + 3) 

= __2(2n-_l)(„-6) 

(«-l) 3 (?z +1)(« + 3 )‘ 


(Moran, 1947) 


(Moran, 1947) 


Z I:" - *• 

« —* *- ■ m.,m 


var v = (*4 + 2) (1 +p 2 ) 
«(l-p 2 )~ 


process, in the manner of 48.1, show that, in 


1 

var c = z 
n 


(l+p 2 )(l- p 2n r 

- +p» 4 2j+ (*4 + 2Kl + p 2 )\ 

'L. -n ^ /- 


C ° V (C ' ®) == — 2j+ ^4 + 2Kj +p 2) 

H L l- fl » 
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and hence that, independently of k 4 , 

1 f(l+P a )(l-P 2i ) „• „] 

var ri-~l -j ~i -J 


(Bartlett, 1946) 


48.9 Analogously to (48.9), show that for large samples, 

cov (r,, rj+k) =- 2 {pip l+ j + pipi +2l+l c + 4pjpj+>cpi 2 -2p1pil>i+l+>c- 2pi+kp i pi+k} ' 

— 00 (Bartlett, 1946) 

48.10 Show (cf. Example 48.4) that the large-sample variance of rj at (48.9) is given by 

n var rj = [(1 + 2 p)) co. z° + co. z 2i - 4pj co. z j ] in G 2 (z) 
where G(z) is the autocorrelation generating function and “ co ” means the coefficient 
Verify on the result of Exercise 48.8. 

48.11 Continuing Exercise 48.4, show that 


E(rj) = —3^(1—P , ) + 3ip J -‘j> n 3 


(Kendall, 1954) 


48.12 Defining the first serial correlation (known zero mean) as 

n— 1 

E WtWi +1 

n i=i_ 

G = —f ~ n _ 

71 — 1 V - .2 


show that, for a random series, r x and the denominator are independent and hence derive 

Vi(ri) = 0 = 


Pi Cg) = 


Pi(r i) 


(« — 1) (n + 2) 

3 n 3 (w 2 + 4n —9) 

(n — l) 4 (n + 2) (n + 4) (n + 6) 


(Moran, 1948) 


48.13 Starting from the Madow-Leipnik distribution of (48.93), make the transformation 

r = tanh z, p = tanh £, z-t, — x, 
and by expansion show that the distribution of x is given by 


Hx) = 77w™ p {-€) X 

a 2 = cosh 2 £/« = !/{«(!— p 2 )} 


where 

and 


/1 x 2 {\-p 2 ) nx 4 (l-p 2 )(l-3p 2 ) p 2 x 2 

x - 1 - fX+ \*k - 2~ + - 12 - + ~T 

/1 5**(1 -(>*) (1 -p ! )(1 -3p ! ) , p ! **\ 

-'“is—r ~ + —12 * + -rf 


+ 0 («~ 2 ). 
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452 . . tha , s is distributed approximately normally about 

Obtain the moments and show that a ^ 

£--^-rt+- 



^eau 


with variance 


^KW 2 ) « 2 (l-/ j2 ) 2 

1 2p 2 




(Quenouille, i 948) 


48.14 Taking the Yule scheme of Exercise 48.2, show that a generating function f 0r 

autocorrelations is given by 

oo 


the 


0 2 S p**’ - ttzh^+o-s* 2 )(i — +o-5^r- 2 ) 

— 00 v 


0-7333 —0-5^ . 0-7333-0-5g- x _ x 

= ^l-l-ls+O-S'i 2 l-lT^+O^" 2 * 

where a 2 = var m. Squaring and expanding, show that 

S P? = 2-44, 

— 00 

and hence confirm that approximately, for large samples, 

var rj = 2'44/m. 

(Quenouille, I947 a ) 

Is 

48.15 For the general linear autoregressive scheme 2 XjUt-j = £* show that 

j =i 


lim 

m —>oo 


var A’ 




( 2 ;) 


= var e / 2 


'l -2 


(Murteira, 1951) 

V 

48.16 For the scheme of the previous exercise, in which £ t is replaced by 2 $£ { _j, show 

that the same limit takes the value * -1 

(v M 


S (-«>/), 

var e / ___ 








0=1 


48.17 In the Madow-Leipnik distribution (48.93), put 

and show that r = smy t p = sin A, y-l = 

A(a) = 


(Murteira, 1951) 


a /£ exp f ~2?T^53* ,+ r—- ^±13^5 

v l 2(1-/»•)* 4 m 24 1 —1>2 


^ y + il+iV) 4 1 P 2 . 2 , 1 P(l + 5p 2 ) 

2(1 4 » 24 i- P . ** + gr^! nW- 8 Twr i 

In 'I /rt j 




TT . . 8 (l-/> 2 )* 48 (1 — p2)S/2 

Hance denve equaUons (48.104) and (48.105). 


8(l-„ a )** S+ ^ «V -i —^- B V4-0(<r’) 


48 (1 

(Jenkins, 1954) 


M 
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48.18 In samples from a random series with zero mean and unit var’ 
characteristic function of _ _ 

p = q = r = £EuiWi+ 2 , 

circularly defined, is the circulant 

« / _ , A r\ —i 


Deduce that if 


T\ (* n a 2nk B 4rcfe\ 

n 1-03,-00 cos - Or cos-— 1 . 

fr=i \ n n / 

- - <£) 

_1_ 

/^lO = 1^11 = ^20 = 72 + 2 

A^ll = /^12 = /^31 == 0 
2 

^ 21 “ (« + 2)(»+4) 

w +12 

^ 22 ~ (n + 2) (« + 4) (m + 6)’ 


(Jenkins, 1954) 


48.19 Following the previous exercise, if statistics are defined with mean u, e.g. 

p =■ \ S (l/i w) 2 , 

show that the characteristic function of p f q y r is now 


71 — 1 / 

11 ( 

*=i \ 


Ink _ 4ttA 

>-C/p COS J 

n n / 


and hence that 


1 — Op — 0 q cos 


^ 11_ *«-l* 

(Jenkins, 1954, who gives values for higher moments.) 


48.20 For the statistic r y defined as in (48.42) but with known mean, and therefore with the 
omission of u, show that the c.f., corresponding to (48.80), is the same with the omission of the 
factor in (a+ /?)=. Hence show that odd-order moments vanish and approximately 

1.3.5... (2k —\) 

^ ~ (++ 2 )(« + 4) ...(n + 2 k)‘ 

Hence verify (48.85) with (n + 1) replacing n. 


(Dixon, 1944) 
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CHAPTER 49 

spectrum theory 

Harmonic analysis encou „tered the spectrum and associated fu„ cti 

49.I In Chapter 47 we have bU t pointed out that they arose natural 

aatramformsoftheautocotrdaO meas ures of the closeness of the correl atio J 

from a “d certain harmonic terms. We proceed to develop thi " 

approach more fully. heat-flow to consider the expansion of functi 

Fourier was led by lus sum 
in series of harmonic terms of th yp 

y( x ) = S a r sin rx + ?b 0 + %b r cos rx. (49.1) 

Notwithstanding the cyclical character of the individual terms, a very wide class of non- 
S functions can be represented in this way over a limited range It is, for example, 
sufficient that, in the range to +*, /(*) be single-valued, continuous except for a 
finite number of discontinuities, and have only a finite number of maxima or minima, 
for such an expansion to be valid. The series on the right in (49.1) is called a Fourier 
series. It has the attractive property that successive terms are orthogonal. For 

J cos rx sin sx dx = J sin rx sin sx dx 

= 0, r ^ s, 

= 7t, r — s. (49.2) 

Hence, on multiplying (49.1) by sin rx and by cos rx and integrating, we find 

1 C 31 

a r = ~ \ f(x) sin rx dx , 
n J 


-5j>> 


cos rx dx. 


J — 71 

he series may also be written in the form 


m = £ 

»•=o 


c r sin 


(rx + fr) 


(49.3) 

(49.4) 

(49.5) 


iere <J> r is a phase angle. 

Since all the terms in (49,n anarf- +u 

fi?) has that period. If f( x ) \ s defin . ° constant > are of period 2n, the expression 

m terms of sin (m/L) and cos („„/L) ° V Tv n ‘T™ 1 ~ LtoL we "“V ex P and 
scaling the interval from one of L„Jt, £ imS> of course > 1S merely a matter of 

ot len ^ h 271 to one of length 2L. 

49.2 Angles, measured as usual in ^a- 

fntity at m sin at has zero dimension , dlans « have zero dimensions. Thus the 

l ° n a « accordingly in radians per time-unit 
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It is sometimes called angular frequency . Where no ambiguity is involved we shall 
simply call it the frequency. However, sin v.t repeats itself with period 2jt/a and there¬ 
fore the number of cycles per time-unit is ol/2ti, which may also be regarded as the 
frequency. The period 2tc/<x. is of dimension t and is also called the “ wavelength,” 
although, in our context, “length” is a period of time. 

49.3 It appears, then, that a function may be expanded in a series of sines and 
cosines, the successive terms in (49.1) having periods 2tc, 2tt/2, 2n/3, etc., and the 
corresponding angular frequencies being 1, 2, 3, with cycle frequencies 1 /2?r, 2/2 tt, 3 /3tz. 

More generally, when /(v) is defined over the interval 2 L, the angular frequencies are 
typified by nr/L. Thus there is one fundamental frequency n/L and the others are 
integral multiples of it. Such a representation would be rather artificial if we knew that 
f(x) was the sum of harmonic components with incommensurable frequencies. We 
are thus led to consider the more general harmonic series 

f(x) = E ^ sin (oq #) + E cos (oq a:), (49.6) 

j=0 j=Q 

where the a’s can have any real values. There is now no simple way of evaluating 
dj and bj such as is given by (49.3) and (49.4). The problem of estimating them was 
considered in the nineteenth century by physicists and meteorologists, and although a 
great deal of knowledge has now been accumulated, the methods in essence are the 
same as those used by earlier authors. However, there has been a change of outlook. 
Former authors were looking for concealed harmonics. The more modern approach 
is to regard the spectrum as a characteristic of the time-series whether it is truly a sum 
of harmonics or not. 

Nyquist frequency and aliases 

49.4 For series observed at equal unit intervals of time there are two important 
features of harmonic analysis to observe. It is clearly possible for periodicities of less 
than one unit to escape notice—for example, if we observe a series every January 1st, 
seasonal movements will not be revealed. We need at least two observations in the 
year to detect periodicities of one year. Generally, for a time-interval t 0 between 
observations we cannot measure periods smaller than 2 t 0i or angular frequencies higher 
than 7 t/t 0 . This limiting value is known as the Nyquist frequency. 

In the spectral density function defined in 47.10 as 

zv(o c) = S Pj eW, (49.7) 

— oo 

our time-interval was unity and the range of a is from 0 to n. The ordinate at n 
represents the value of the spectral density at the Nyquist frequency. 

49.5 The second effect to remark is also related to the interval of observation. 
Suppose that the interval is unity, and consider the term sin (2nt/3) for t = 1, 2, 3, etc. 
Its values are \/3 /2, — ■s/Z /2, 0, \/Z /2, etc. But these are also the values which would 
be observed for sin (%nt/Z) or sin (14^/3), etc. The width of the interval of observa¬ 
tion does not permit of a distinction between angular frequency 2n/Z or any of the 
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SPECTRUM THEORY 

angular frequencies 2 tt/3 +2nj, j = 1, 2, 3, etc. These higher frequencies are known 
as aliases. So far as observation goes they are all equally consonant with the data. 

49,6 In 47.11 we defined functions 

<z(a) = —^—- E u ( cos a t, (49.8) 

V( K7r ) =i 

b{a) = —1— S u t sin a t. ( 49 * 9 ^ 

y/{mi) t =i 

We showed that the intensity 1(a), defined as the sum of squares of a(a) and b(a), was 
equal in the limit to the spectral density function w(a) multiplied by g 2 /ti. We graphe 
w(a) for a Markoff and a Yule scheme in Fig. 47.4 and 47.5. A few practical examp es 
are given in Fig. 49.1 to 49.3 for comparison with the correlograms of Exercises 47. 



Fig. 49.3—Power spectrum of the data of Table 47.4 (second-order autoregressive 

scheme) 

The dotted line is a smoothed spectrum using a Parzen window (Exercise 49.7) and the 

first fifteen covariances. 

to 47.22 (namely the Beveridge wheat series, the marriage-rate data of Table 47.2, and 
the artificial Yule scheme of Table 47.4). 

Observational material often presents these wild fluctuations and we shall see 

presently why this is so. 

49.7 It is sometimes convenient to take as ordinate the logarithm of w(<x) rather 
than w( a) itself. This avoids over-emphasis of the larger intensities and also has the 
advantage, as we shall see later, that the error bands in certain classes of estimation 
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f constant wi* h < c , ' d on log , ums 0 f variables with the same v ar ; 

? r X next chapter are b wetg h( « d We will first consider the bel,^". 

m sums <K«) and ? 1, stationary Dresen t. av,0 Ur 


in. — to normal^r ^Tlnf^ a « P**"*- 

will be dose to norm or ,ren 

of the spectrum of , harmonic term with angular f re 

♦hat the series Ut ^ on ) te d with it: c y 

49.8 Suppose that m ^ not correlated 

a added to other term ^ = r sin a*-+£• ( 49 -l 0 ) 

We calculate » gin 

sin + a simi|ar term with +/? in place of -/?. (49.11) 

, a c r- rt this is dominated by the first term. The sum 
In the neighbourhoo of f . Hence the intensity /(a) is the sur 




I V h i n i 8hb Zsi } S 0d of°negligible size. Hence the intensity /(a) is the sum of 

C - <«.!>) * <•/-. w. w 

r//j\ _ ^ sin 2 {?n(oc--p)}' 

'P' Ann sin 2 (|(a—/?)} 

The corresponding periodogram ordinate is, from (47.30), 

c 2 (r\ _ g 2 sin 2 {|w(a-/ ?)} 

^ ;z 2 sin 2 {|(a—/?)} 

• Now suppose that 


(49.12) 


(49.13) 




a-/? = « large, wz finite. 


(49.14) 


We have then, to a close approximation, 

c//n c 2 sin 2 

( 49 - 15 ) 

?r/‘ /J -7n 0t th n P, eri " do ? ram "nP ha ™ a peak of amplitude and this will be 
from! by kSSer peakS 0f diminia Ping intensity at distance fc f, etc., 


-* w» 

‘«Srj!;r except that at />•=«** «*** 

the divisor ^(nn) in (49 8) and 149 Ql ^ ^ !? pract ^ ce * The reason for choosing 

WS2?r:^3j- r - 5-a: 
»—t ,p C" “ •» In , le 


' ^ 








Intensity 



SPECTRUM THEORY 


49.9 Whether the “ side-bands ” given by (49.15) show up either in s^ectnim ^ 
periodogram depends to some extent on the intervals of . fr ®^ en ^ (as> i n our 

are computed. If the periodogram is plotted with period as ab V let 

definition, it is) the side-bands become wider for increasing period. 

, 2n 2 7i (49.17) 


- In 

A - y 

a 


Then from (49.14) 


or approximately 


2 7C 


[x —A = 


(49.18) 


so that the width of the side-band peaks depends on 1 nb Fitr 49.1. 

Fig. 49.4 gives the periodogram of the Beveridge series for comparison W1 fairly 

The values were calculated by Beveridge, at first on a grid of waveleng^s of 
equal width, but supplemented by additional values where peaks seemed to b 



Period (years) 

Fig. 49.4—Periodogram of the Beveridge wheat-price index series, for comparison 

with the power spectrum of Fig. 49.1 


Example 49.1 

It sometimes avoids tedious summation, and makes the essential point for asymptotic 
results, if we replace sums by integrals. For example, with n large, 

n 

2 sin a t sin fit 

«=i 


may be replaced by 
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7 „ T is the length of the series. The integral is seen to be 
where T rs the g ] {(a _^_sin («+«r 

2[ a-/? J 



Likewise to the same degree of approximation 

L ? If cos {(a co S {(a+ /?)n-n 

2 sin cos fit dt - ^ ^ a -/i a + /9 ~~j • 




The intensity near a = P is then given by 

r//n _ sin 2 
W ~ nT{x-py 

and the limiting case when oc—/? tends to zero may be discussed as before. 


( 49 . 


19 ) 


Non-harmonic periodicities 

49.10 It must be remembered that a peak in the spectrum, interpreted as 
monic, is only unrelated to other peaks if they all relate to pure sine or cosine t 
If there is present a periodic term which is not a simple harmonic there may be s 6rmS ' 
peaks in the spectrum corresponding to it. era * 

Consider a somewhat extreme case in which the periodicity is of the tvn* u 
in Fig. 49.5. yP Sh0Wn 



wfw; ^ut^f m ‘ he °^' < ^ ““y « Now 

I* = sin*-Jsin2*+Jsin3*- .... 0^ x <n. ( 49 m 

hus, in the spectrum there will be peak intensities at frequencies 1 2 3 etc Tn il„. 

st r n t rt s ;:r form a seWe f, with 

» 4 > £»> etc. ror non-harmonic periodic elements thrr^fr,™ • , 

the possibility of the fundamental frequency being echoed’along the spe«rum ^ 

Example 49.2 

take7 e V7e77„7fl at r h : n P d Pe 75 v thC has a , ,inca '- ‘rend in it. In fact, let us 
Example 49.1, we have ^ SpCC 1Um ana ^ s ^ s to it* Approximately, as in 

J t sin <xtdt = jT - 1 cos a f j 2 _|_ cos aT 1 


a 


dt 


Teas <xT sin aT 
+-. 


Likewise, 


rr 

Jo 


a 


a- 


^ cos y.tdt 


^sin aT cos aT- 1 

+ -- 


a 




(49.21) 

(49.22) 
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Thus the intensity is given by 

7 <“>=^{S + 

T <- 0 ( 1 ). 
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■0(T)| 


(49.23) 


n0L • V| 

The power spectrum (T constant) would therefore be a curve of type y = 1 /v , wi 

Uree intensity at the origin. The periodogram, on the other hand, would have 


S 2 (a) = where X is the wavelength. (49.24) 

a 2 n 2 

The results are understandable in general terms. A trend is like a long wave which 
is equivalent to a low frequency. Evidently, if low frequencies are of interest, every 
endeavour must be made to remove trend from a series before spectrum metho s are 
applied. 


Test for the spectral ordinate 

49.11 Harmonic terms in a series may be likened to point-densities in a probability 
distribution; in the spectrum they define lines, not continuous densities, althoug , o 
course, in practice these lines are blurred for finite series. We proceed to const er 
the behaviour of the spectrum for stationary series of the non-deterministic type, w r c t 
can be represented as the weighted sum (finite or infinite) of a series of random variab es. 

Consider first of all the sums a(ct) and b(a) of (49.8) and (49.9) when u t is a tan om 
series with zero autocorrelations and variance a 2 . Since, for large n, 

- £ cos 2 <x.k~>% - £ sin 2 afc->f, (49.25) 

n.h =t nic =i 

— E cos cak 2 sin <x.k ->0, (49.26) 

7lh =1 1 

we see that a, b are independent N( 0, a 2 / (2tz) ) variables. Hence 2nl/a 2 = 2n{a 2 + b 2 ) Jo 2 
is distributed as % 2 with two degrees of freedom. Equivalently, the sum S 2 in the 
periodogram is distributed as 

dF= ^(- n €) dSK (49 - 27) 

It follows that for the spectral ordinate, asymptotically, 

E(I ) = E{w) = 1, (49.28) 

7Z 

var 1 = var w = 1 = {E(w)} 2 . (49.29) 

7l l 

Thus for a random series the standard error of the spectral ordinate is of the same 
order of magnitude as the ordinate itself. 


The distribution (49.27) has been used to provide a test for ordinates in the periodo¬ 
gram. The probability that S 2 exceeds some value 4 o 2 k/ii is e ~ K . In 1914, G. Walker 
pointed out that if e~ K is small, the probability that m independent ordinates should 
not exceed 4 a 2 K/n is (l—e~ K ) m , so the chance that one at least exceeds that amount is 

1 —(1 —e~ K ) m . 
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«■ ■■ ^js^sstsssj^itis r-s 
sar«rsOT.«- - ■«>-«- <*—-tfe *, 

(49.8) and (49.9) in a single foimula 


/(«) = 4a) + <%) - “< ^ (49.30, 

The expectation of /(a) is zero. If u t * s a random series with zero autocorrelation 


and variance o' 2 , we have w 

E{MM } = — S 

U7 \ U \r/j nn t = 1 

a 2 e i{a.+P) |1 — £»«(a+/0J. 
1 e i(a+P) 


titc * * - ~*i 

If a, are of the form 2np/n, p integral, this vanishes. Similarly it follows that 7( a \ 
is uncorrelated with the complementary J* (a) = a(a) —z%(a). Thus «(a) and A( a ) are 
uncorrelated in this case and the corresponding spectral ordinates are uncorrelated 
We have 1(a) = /(a)/*(a), and putting /? = -a in (49.31) we confirm the result ' 
t 


(49.31) 


that 


£{/(«)} = 


We have further 


71 


E{!{<*) I{P)} = ~~e{ i u t e M Z u s e~ ias i u k eW% u,e~W 
n n ^=i «=i a=i i =i J 


1 


n 2 n 2 


2 2 E(u ( u s u k u t ) exp [i(at -as+pk- pi)}. ( 49 . 32 ) 


Th ® ex P e ctations vanish unless t = s = k = l (giving the fourth-order moment of 
cXe IT " ' "'“ d * ' ' corresponding term is 


cov {!(&.), I(P)} = 


Km 


1 cos {^(q+ ^ 1 — cos (n(a — /$)}' 


Tf . c. a - nn n * n2 ^ * cos (a+/?) 1 — cos (a — p) 

If « - ^ we find, since 1 - cos 0 = £0 2 for 0 small, 


(49.33) 


var 1(a) = ~+ O(rc-i), 




(49.34) 


confirming (49.29). 


say 


49.13 Consider now the case when u, is a weighted average of random variables« 


Ui = J ff r. 

1 Ss^t-S' 


(49.35) 












/„(*) = 


—77 -x ^ ^ Ss 6 t-S el{ 

V(n7i) 8=o t=\ 

—L_ T S i 

\/(nn) s =o *=i 


V(w^) *=o 

= /.(«)*(«) 

where h(a) is the transform of g s , namely 


S ^“*g 8 , approximately 


3 = 0 


(49.36) 

(49.37) 

(49.38) 


A(a) = ^ Ss eiaB - 

3 = 0 

We have at once 

7 u (a) = /.(«)*(«) *•(«), 
which is another form of the result obtained for the effect of a transfer function in 
47.24 in the context of a continuous series. Further we obtain 

E{I U («)} = %) h*(a) E{I e (a)} ( 49 - 39 ) 

and asymptotically, , /±Q 

var/„(«)-[*{/.(«»]*. ( 49 ' 4 °> 


Smoothing the spectrum 

49.14 These results provide us with a novel problem in estimation. The observed 
ordinate in the spectrum for a series of length n does not have a variance of order 1/w, 
but of order w 2 . Furthermore, since the ordinates for values of a equal to 2ti pfn are 
uncorrelated (exactly for normal variation and approximately otherwise), ordinates 
calculated for such values are effectively independent. The observed spectrum will 
thus fluctuate violently—Fig. 49.1 is a good example—and is a most unreliable estimator 


of the parent spectrum. 

We shall attempt to overcome this difficulty by smoothing the spectrum, replacing 
I(<x.) by a weighted sum of neighbouring ordinates. This will render the estimator 
well-behaved in the sense of having a small variance, but to obtain such a result we 
have to pay a price in the form of bias in the estimator itself. 


49.15 Let us take a 

weighting function h(u) obeying the conditions 
h(u) = h(n + 2n), 

( 49 . 41 ) 


r 7T 

h(u)du = 1. 

( 49 . 42 ) 

This function is variously known as a “ kernel ” or “ spectral window.” 
the estimated intensity we construct the smoothed function 

If 1(a) 


Ia(*) = 

J 

71 

h(u ) /(a — u) du 

— 71 




1*71 

A(a — u) I(u) du. 

— 71 

( 49 . 43 ) 


If 1(a) is unbiassed we see, on taking expectations, that I A ( a) will, in general, be 
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464 To reduce the bias we desire h{u) to l 

biassed, being a weighted average. r hood of u = 0, in which case the 
tratcd in a narrow ^ amox imately unbiassed result. Unless h(u) i s , ^ o, 

the 



ted in a narrow range in unbind -suit. Unles^^?< 

s right in (49.43) wdl give however , be some loss of resolution. V‘^ly, 

: unit function at # , f nosses sing an effective range whirl, i.^* 


the unit function at# 0 f ^ possessing an effective range which i s 3 

concentrated definition -* to +7t, and this effective range i s 

narrower than the fu „ f th spe ctral window. S0| »e- 

I A U) = — S /<(#,•) f(# - #y)> u l = 2n H n 


w. 


Since the values of I are independent we then have 

• / ( a ) = ^ 2 h 2 (uj) var /(a-«,) 
w zr 


( 49 . 44 ) 


var 


and using (49.40), this is approximately 


4tt 2 




S/z 2 (^)/ 2 (a-«,.) 


(49.45) 


= — f h 2 (u)I 2 (c/. — u)du. 

ft J —71 

If ^(«) is concentrated in a narrow bandwidth this will give us, approximately 

?77 f Jr 

*'(«)*• (49.46) 


var 


Thus, provided that the integral is bounded, the variance is now of the order of 1 In 
It also follows that, to the same degree of approximation, var log 4(a) is a constant 
and hence that log I A has confidence intervals of constant width. ’ 


equ* r alS ° be Sh0Wn that the COrrdat; ° n bctween and m is approximately 


j ^(u)/i(u + a-/3)du1 1“ h’(u)du, 


(49.47) 




Calculation of spectra 

the sums ('W°8)ajid C (49 U 9) t for varying ^ah ° r< ^' I J !,tes in P™«ice we do not work out 
e spectral density. I n f ac t, we^ave ^ ^ ^ then com P ut e the intensity - 


or 


/(«) 


'n 


m((f u,C0Slxt ) +(xu, sin at 


1 


nn »,L“*«<( c °s cos „ +sin at sin ^ 


1 v 


n 


Oc cos 7;a, 


« 

■w 


(49.48) 



& 
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|R- ^ere c k is a covariance-type expression defined by 

c fc = - S u t u t+k . 
n t =i 

For infinite n this reduces to the known expression (cf. (47.27)) 
$ /(a) = ?!«,( a ) = — S p fc cos fox. 


(49.49) 


(49.50) 


Calculation of the spectrum usually proceeds from (49.48). In any case we ^ 
compute c k for k>n— 1 and in practice would rarely wish to go as far as n 
correlations. Let us then consider the estimator 

I (a) = - S 4 c k cos fox, v 

71 — ^ • r Y 

based on q serial correlations. (The A’s are constants to be chosen at convenience 
the purpose of improving the estimator.) This is equivalent, in parenta orm, 

4(a) = — S 4 p fc cos £a 

*71/ — 2 


w (a) = S 4 Pk cos « a * 


(49.52) 


But from (47.22) 

1 f 71 

p lc = — 1 w(m) cos ku du, 

71 J Q 

and hence on substitution in (49.52) we find 

1 <? fTT 

w (oc) = - S 4 w(t*) cos ku cos /ea dw 

^ — q J 0 

= j” w(m) £ 4 cos k u cos . (49.53) 

The use of (49.51) is then, asymptotically, equivalent to smoothing the spectrum by 
the weighting function 


h(p) = —24 cos k[i cos kx. 

2 71 -q 

Provided that 4 = 1, this obeys conditions (49.41) and (49.42). 
We also have 

f Z* 2 (P) dp = i s cos 2 6a. 

J — 7 T ““ (Z 


(49.54) 


(49.55) 


Example 49.3 

Suppose, in the first instance, we take all A’s equal to unity. Then 

1 q 

hip) = — 2 cos kp cos kx 
2 71 — q 

= — 2 [cos {(x.+P)k) 4-cos {(a— P)k}] 

71 —q 

1 |~ sin {(g+|)( a + ft)} .si n {(g 4-j) («-/?) }' 
sr [ sin {f( a + £)} sin (|(a-/5)} J’ 


(49.56) 
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1 r 1 , si n {(2g+^W 

= ^ I g4" 2 4“ 9 cin (Y 



2tt L 


Example iU (Bartlett, 1950) 
Take 
We find 


4 = 


1 rsin 2 {k(^)}4-i^ft^°^S 

W = ^ f q sm 2 (K a +^)} ' g sin2 SO*-/*)} 


and 


f' A ! (/»)<#»- 

J — 71 


<JU ~' 

1 ffa-lH^-l) . 1 _ sin 2q<x. cos a'] 

— _ J -- —- "I /-» -9 77 /f ~2 S .. I 


2^ \ 6^ 


2# sin 2 a 4# 2 sin 3 a 


Example 49.5 (Daniell, 1946) 


he. — 


sin 


kh 


h>0. 


Take A * kh 

We have the known integrals 

' s mpxwsg x dx = ^ jp\ > \ q \ 


Jo * 


= 


The weighting function is then given by 

= 1/ 


PI = |g 
= 0, \p \ < | q 


7/m x t1 n X cos Bk cos <xk sin hk'' 

w= &i I+2 i ? 1 — 


hk 


which is approximated by the integral 

' 00 sin hx 
2 tc 


•S 

i foo s ; n h x 

2n Jo ~~hx~ ^ C ° S (K a +^W +cos {$(<*-#*}]<& 


1 


= 2k> h^cc-p^-h, 


= 0 elsewhere. 



< 49 -5 8 ) 


(49.59) 

(49.60) 


(49.61) 


(49.62) 


(49.63) 


Various other kernels have been suggested, notably by Blackman and Tukey (1958) 
and Parzen (1961). See Exercises 49.5-7 and a review by Jenkins (1961). 


Estimation of spectral densities 

attention est ^ating spectral densities has received a great deal of 

would OCCUDV “ mP T mt the SUbjeCt (Which iS itSelf b y means Complete) 
would occupy more space than we can allot to it. We must content ourselves with a 
summary account of the principles. mUSt COntent ourselves wltn a 

length of the power sp^mm™ (TV P . r ° Vlde good estimates of tIie ordinates along the 

a point which is apt tobe overlooked "wfha SUaUy ‘7 uIt ™ ate ob -> ect of the anaIy f *’ 

•) We have seen that the ideal may be unattainable 


SPECTRUM THEORY 


467 



for various reasons. We therefore introduce the kernel or spectral window to smoot 
out the grosser irregularities in the observed spectrum. A “ good ” kernel will be 
relatively narrow in range, but no kernel can be perfect, and its values may be unduly 
influenced by casual peaks in the spectrum—there is then said to be “ leakage ” round 
the edges of the “ spectral window ” which we are using to scrutinize part of the 
spectrum. We are thus led to consider the effectiveness of different kernels in smooth¬ 
ing, and hence introducing reliability, as against averaging, and hence introducing bias. 
For some of the procedures possible in this context reference may be made to Whittle 
(1957), who considers a prior distribution of spectral ordinates, Blackman and Tukey 
(1958), who discuss the use of prior analysis of the data, and Parzen (1961), who con¬ 
siders, among other things, the prior determination of the rate of decay of the 
autocorrelations. 


49.18 Leakage causes trouble, especially in estimates of the low part of the 
spectrum, for the kernel, though itself small in the outlying parts of its range, may 
swamp the average when multiplied by a high value of the spectral ordinate. For 
this reason Blackman and Tukey (1958) introduced a process known as pre-whitening, 
the object of which is to filter the series so that the peaks are flattened out. For example, 
if the original spectrum has a peak at and we can transform the original series so 
that this peak is flattened out, the estimate at some other point a 2 will no longer be 
distorted by a x . This is obviously a rather dangerous procedure, but fortunately we 
can afterwards recolour the spectrum, in the terminology of Nerlove (1964). The 
basic idea rests on the result we proved in 47.24, that if v(t) is a filtered series derived 
from u(t) by a linear filter, then 

(«) = w u («) I /(<*) 1 2 > (49.64) 

where /(a) is the transfer function of the filter itself. Knowing the filter, we can 
always recover the spectrum of the original series from the estimated spectrum of the 
transformed series. The procedure has been examined by Hext (1964). It may well 
be desirable to use different procedures for different parts of the spectrum. 


49.19 Daniels (1962) develops some alternative approaches. He considers first 
of all a preliminary smoothing by a kernel chosen so as to obey a given criterion, e.g. so 
as to achieve the minimum tolerable resolution. Then he unsmooths the spectrum by 
setting up a routine which improves the resolution at the expense of the sampling 
variance until no further useful change is detectable in the fitted spectrum. Two 
unsmoothing processes are discussed, one approximating the spectrum locally by a 
polynomial, the other based on differences of the spectrum. The process is empirical 
in the sense that it uses the data to determine the estimator, and it requires a good 
deal of computation, but it at least proceeds by successive approximation to a stable 
solution. 


49.20 In connexion with equation (49.64) we may note some work by Hannan 
(1960) and Durbin (1961) concerning the effect of seasonal variation and trend-elimina¬ 
tion on the spectrum. We have already remarked on the problems created by the 
elimiantion of trend in distorting the residuals. With (49.64) we can regenerate the 




■ 
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466 .. . j u v trend-removal, at least in theory. But tv,; 

Jv'Cm* f/^t"rrtn y reg e n e ra .e the residuals themselves. 

does not, o co , . that we are not interested in the periodo Bra „, 

49.21 It must always be mm ^ except per haps in the domain of physic 
or the power spectrum for «» in the spectrum can be given a phystcal i Met . 

electrical engineering, * he « a " which is obtainable at a given cycle frequency, p 
pretation as the f' oun * < ‘ 1P °Z spectru m is a diagnostic instrument whose main u* 
general statistical purposes » ? (q rate ,he series under observation. I„ terest 

is to suggest an appropnate ^ hypothes es concerning the model 

therefore tends to be foe ,. s in a CO rrelogram or a spectrum, and this i s j 

rather than testing P® chapter. For more extensive studies of spectrum analy sis 

subject we consider m Bla P kman and Tukey (1958), Grenander and Rosenblatt 

(TpsTantGmnger and Hatanaka (1964), and the symposium edited by Rosenblatt 
(1963).’ 

Unequal time-intervals . 

49 22 Finally we may add a few comments on a point of some practical importance 
where* daily or monthly observations are concerned. Suppose we have a series u 1} 
etc. observed at intervals of mi, so that our information consists of observations u(tn), 
u(2m), etc. For example, we may observe a daily series once a month, in which case 
m is, on the average, about 30. Suppose further that the intervals between observations 
now’vary about m to some extent, so that we have instead observations tt(m+ £l ), 
u(2m + s 2 ) . . . , etc. If u has zero mean and unit variance, which we may assume 
without loss of generality, the autocorrelations of the original observations are given by 

p(km) = E[u(tm)u{(t + k)m}'\, k = 0, 1, .... (49.65) 

Those of the second series are, say p*(km), given by 



■A 


1 U 

p*(km) = E- 2 u(pm + e p )u{(p + k)m + e p +fc } 

Mp =1 

1 » 

= - S P (km + e p -e p+Ic ). 

rl p = l 


( 49 . 66 ) 


Expanding p either as a Taylor series or an equivalent series of differences, we then 
have, to the second order in e, 


1 


P*(km) = P (km) +- 2 (ton) 

/1 


+ 2 n ^ ^ 


'"p-\ 


\2 >' 
1c) P 


(km). 


( 49 . 67 ) 


If e has zero mean the second term is small and vanishes in the limit. Writing o 2 for 
the variance of e and r Jc for its &th autocorrelation, we then have 


P*(km) — p(km) + a 2 (\ — r Jc ) p" (km). 

On the average, then, the autocorrelations 
is small. 


( 49 . 68 ) 

are not seriously disturbed so long as <r 2 
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For example, consider observations made on the first of the month instead of dal y. 
The average month-length, taking account of leap years, is 30437 with a variance o 
n.70 The first autocorrelation r x is — 0 42, and the second is positive. On our sea e 
14 2 =: 0-70/(30437) 2 = 0-0 3 76. The autocorrelations based on the first of the mont 

will then, from (49.68), be only slightly affected on average, provided that p , or the 
second difference of p, is not very large, which is so. . , 

Similar arguments apply to the power spectrum. Low frequencies are emphasize 

slightly, but the effect is negligible. 

49.23 The matter stands differently for series which are aggregated, such as rain¬ 
fall The effect of differing time-intervals may then be serious, as is fairly evident 
when we remember that we may be comparing sums based (in the case of months and 
days) on 28 or 31 observations. It is, in our opinion, essential in such cases to 
standardize the data by reducing them to a period of constant length. This is particu¬ 
larly true for such data as output per working week or inputs per working month. 

Granger (1963) has discussed the matter in more detail from the spectrum viewpoint. 
See also Quenouille (1958). 


EXERCISES 

49.1 P(t) is a polynomial of degree k for 0< t< T. Show that asymptotically the ordmate m 
the periodogram corresponding to frequency a is given by 

4P 2 (T) , ^,_ g) 


a 2 T 2 


-0(T 2/; - 


49.2 A series has the value e ct in the range 0 to T, c > 0. Show that asymptotically the 

ordinate in the periodogram corresponding to frequency a is 

16 exp (2 cT) 

-—-sin 2 Ja T. 


49.3 Given that 


4 / . si 

x = - I sin x - 

n \ 


sin 3x . sin 5.v 


— ln<x< 


3 2 5 2 / 

graph the series whose term is # over the range 0 to 4n. Compare with 49.10 and comment on 
the effects on the power spectrum. 


49.4 Establish equation (49.47). 


49.5 In (49.54) take 


l ]c = 1-2a+2a cos (nk/q). 


Show that 



n 


where y = a— /?, 


sin {q+\)y /sin {(q + \)y+n/q) . sin {{q+\)y-n /q}Y 1 

(i —la) — in ^7 • / sin (fo+n/qj ' sin {\y-n/q) J J 

together with a similar term obtained by putting y = a + 0. 

(Tukey, cf. Blackman and Tukey, 1958. They 
propose the values a = 0-25 or a = 0’23.) 


(Parzen, 1961) 


b eorv oF 


aD VANCE d 
—- - thc 




sin qVj°l 
si tiHr 



statistics 

vvith 


in ,he previous ex.rc.se, 


show that 


4nq* (sin 


\y 

" sin 2 iY J 

0<k< %4’ 
Q> 

lYjV 

irJ ' 


(Parzen, 1961) 


, , _ ,. , U V2 y Q = /tJ/4- 3, show that asymptoti- 

498 If «< is stationary and normal, and yi — Pa/Pa .» 
cally y, is W, *i) with 

R\ = 6 S P) 


and that y 2 is N(0 , i? 2 ) 


i? 2 = 24 S pf. 


Show how this may be used to test for normality of a stationary process. 

(Lommcki, 1961) 

49.9 The Buys-Ballot table. A series of pp terms is written down in p rows of fx thus: 

P\ Mg ... 

«A + 1 u H+2 • • • u 2n 


«(p-l)i«+l M(p-l) jU +2 . . . U, 


PA 


% ... m u 


Sums: m 1 

Show that the sums A and B entering into the periodogram are given by 


49.10 With reference to the 
Show that if 


A — ^ y 2nj 

p7i?r ,cos T’ 

B = m S m> sin 

Previous exercise, consider 
= var m/var u. 


where bj is uncorrelated with P er iodic = terms/then^^^ > 


v 2 (p) 


= ( a2 J^^(nn/X) 
\ 2 n 2 sin 2 i'//iwiT 


s » ! (PVA) + ;va r 6jy / (j il! 


+ var b 


)- 




KSP S 
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Hence show that, in the neighbourhood of A, the graph of rj as ordinate against // as absciss 
(Whittaker’s periodogram) has a peak of breadth 2 A 2 /n, flanked by 1911) 


49.11 If an autoregressive series of Yule type (47.74) is subject to errors of. observ^io^ 
which are independent from term to term, show that the serial correlations (except r 0 ) are re 


in constant proportion, say c. 


Hence, if a, in 

w H-2 + a w$+i+/fai = e t + 2 

are estimated from the Yule-Walker equations (47.66) as a\ b show that, for the series su jec 
to error, 

b’/a >b/a y 

where a , b refer to estimates for the same series not subject to error. 

49.12 The following are the spectral densities computed for the series of Table 47.4 and 
Fig. 49.3 for smoothing with a Parzen window (Exercise 49.7) and various truncation po* n s } ^ 
the number q of serial correlations computed. Sketch the power spectra and note t e 
turbing effect of having q too large. 


Sr. 

V 


Frequency 
(cycles per year) 

No 

smoothing 

q = 15 

q = 20 

q = 30 

0 

0-0000 

19-3065 

15-5732 

11-6349 

00156 

10-4916 

20-0120 

18-2743 

15-8810 

0-0313 

39-5419 

21-8959 

21-9328 

24-0449 

0-0469 

18-6496 

24-3610 

25-1102 

28-3488 

0-0625 

40-5257 

26-6535 

27-2150 

26-1922 

0-0781 

1-1554 

28-0290 

29-1099 

25-9544 

0-0938 

34-9011 

27-8621 

30-4033 

33-5149 

0-1094 

52-4760 

25-8092 

28-9764 

35-8784 

0-1250 

36-6309 

22-0280 

23-7428 

25-2362 

0-1406 

2-7163 

17-2623 

16-5376 

13-2443 

0-1563 

3-7282 

12-6006 

10-4166 

7-8444 

0-1719 

9-6669 

8-9806 

6-9051 

5-9028 

0-1875 

2-5688 

6-7729 

5-5586 

4-6738 

0-2031 

0-0968 

5-7301 

5-3329 

4-8532 

0-2188 

16-3082 

5-2859 

5-4431 

5-9667 

0-2344 

5-5837 

4-9314 

5-3382 

6-0461 

0-2500 

7-7572 

4-4191 

4-7363 

4-9161 

0-2656 

1-3151 

3-7474 

3-7738 

3-5804 

0-2813 

0-4748 

3-0386 

2-8106 

2-5929 

0-2969 

2-4491 

2-4260 

2-0975 

1-8886 

0-3125 

0-9332 

1-9872 

1-7045 

1-3929 

0-3281 

1-9702 

1-7203 

1-5886 

1-3985 

0-3438 

0-7401 

1-5573 

1-6005 

1-7491 

0-3594 

4-0405 

1-4114 

1-5402 

1-8142 

0-3750 

0-3481 

1-2300 

1-3106 

1-3800 

0-3906 

0-4831 

1-0181 

0-9851 

0-8537 

0-4063 

0-5796 

0-8206 

0-7045 

0-5774 

0-4219 

0-1827 

0-6847 

0-5519 

0-4695 

0-4375 

0-4529 

0-6311 

0-5288 

0-4463 

0-4531 

0-4774 

0-6486 

0-5974 

0-5478 

0-4688 

0-1512 

0-7041 

0-7097 

0-7256 

0-4844 

1*1829 

0-7582 

0-8112 

0-8687 

0-5000 

1-8403 

0-7801 

0-8521 

0-9190 
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CHAPTER 50 

TIME-SERIES: SOME FURTHER TOPICS 


50.1 The theory of time-series has not reached a stage, and may never reach a 
stage, at which a clearly structured account of it can be given. To some extent this i s 
due to the complicated nature of the subject—we have to take account not only 0 f 
probability distributions but of their autocorrelations over time, and the embarrassing 
profusion of parameters which results may make it difficult to choose among a sizeable 
set of different hypotheses which are all consonant with the data. In some fields 
especially in economics, experiences are rarely long enough to enable us to lean as 
heavily on our models as we can, for example, in physics. A run of fifty years’ data is 
“ long ” as such series go, and even if longer, may arise from a system which is itself 
undergoing important structural change. 


50.2 The advent of the electronic computer has removed most of the tedium which 
was a serious obstacle to former workers on time-series analysis, but there remain the 
problems of formulating and testing hypotheses or of setting up a model of the system 
under study. For this reason, a working statistician very often needs to call in aid 
great deal of extraneous information of a non-statistical, perhaps a non-quantifiabi* 
kind, in order to define his problem and to set up his models. We shall not attemnt 
a review of the considerations and methods to which he must have regard in this non- 
of his work We take them for granted, and in this final chapter shall consider ," 
purely statistical aspects of the subject.- estimation and hypothesis testing multivariate 

questions ~ ing 



t 




Estimation 

series anatyfis aXin^n^fW Which ^ peculiar t0 tim e- 
problems of estimation or hypothesis testing. 3 emPtS l ° rCaCh exact results !n 

if we discuss a Markoff schemTalthowTthe^e 8 ^'^ Ser ' eS ' 11 Wil1 make for clarit y 
tions u„u t .. we have argUment ,s For a set of observe- 

U 1 ~ PUo + Ej 

u 2 = pui + e 2 

m) 

If the probability distribution of the e’s were l " ^ 
regard (50.1) as determining a variate trlnsformT’ ** ™ f{S " *«’ • • • - <0. we might 
But here we encounter a difficulty ^ X ™ ‘7 *° "7 Variables “>• « • • • •. «»• 

y „ 13 also m volved. We have, in fact, n 
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variables b and n +1 variables u. Let us then add to (50.1) the supplementary 

U 0 = u 0 . 

We know that w 0 is dependent only on s 0 , e_ u etc. and hence is independent of B t t 
e If its frequency function is g(u 0 ) we then have for the joint distn u 10 o> 

€fr • • • i ^n> j f50 3) 

dF =f(e ly e 2 , . . . , e n )g(u 0 )de 1 dE 2 ... de n du 0 . v 

Let us now make the transformation to variables u 0 , u lt . . • , u n . The Jacobian is 
easily seen to be unity, and we find 

dF =f(u 1 -pu 0 , u 2 -pu ly .... u n - pu n _i) g(u 0 ) du Q du x . . . du n . ( • 1 

To manipulate this expression for the purpose of deriving estimators or tests we require 
to dismiss the element u 0 , which is unknown. (If it were known we should have starte 
the series of observations with it.) There are several ways of doing this, but t ey a 
involve some sort of limitation on our inference: 

(a) we may assume u 0 known and make the inference conditional upon it, 

(b) we may make the sample circular, i.e. assume that u n — u 0 \ 

(c) we may neglect u 0 by showing that asymptotically its effect is negligible. 

50.4 The method we shall consider is the third. Suppose, for example, that the 
e ’ s are distributed normally with unit variance. Then u 0 will be normal with variance 
1/(1 — /5 s ), and for log L we have, apart from constants, 

log L = i log (1 — p 2 ) — £ S (uj - pM 3 -_i) 2 - 4(1 - p a )tto- (50.5) 

3= 1 

The term involving u 0 is seen to be — + pu 0 % and we integrate this out to obtain 

logL = const. + 4 log (1 — p 2 ) — 4 S {uj-pUj^ 2 — 4(l-p 2 ) M ? • (50.6) 

5 = 2 

For large n the summation dominates log L, and asymptotically we have 


log L -4 2 i u j-pUj-i) 2 ' 


(50.7) 


We can estimate p by maximizing this likelihood, which is equivalent to minimization 
of a sum of squares over (n— 1) terms. Apart from the approximation, the results are 
what we would have got by treating the autoregression as an ordinary regression. 
The ML estimator is then 

n / n 

p =2>Uj Ui_ x / 2 uj _!• 

2 / 2 


50.5 The same point may be made in a different way. The variance of M i is 
1/(1 -p 2 ) and the correlation of and Uj is ph-il. Thus the dispersion matrix of 
W 2 , • • . , U n is 

/ 1 P p* ... p«-i\ 

1 if 9 1 P P ” (50.8) 


1 -P 2 



p n-l p n -2 p n-3 






'Inced theory OF statistics 
THE t, 1/a-v) “ d ,he invere n\ ls 

* found ;r V » o ;;; °\ 


v- 1 - 


1+? 2 


(50.9) 


\ n o 0 0 •" V 

= i iog(i-/)-i{ j | 2 ^-' ,% - l)2+(1_ ' >!K }’ (50 - 10) 

which brings us back to (50.6). 

.• poncerns the relationship between auto- 
50.6 A second general point to no already remarked in 47.18 that an 

ssr ssrssjsr-a■ s 

s.-jasr^s* »v ~ »• - 

point by reference to the scheme ^ = Si+ ^_ v (50.11) 

The dispersion matrix of the series (with unit variance for e) is 
/l+P* P 0 0... 0 \ 

/ p 1+P* P 0 ... 0 \ 

[ 0 P 1 +p z p... 0 - (50.12) 


\ 0 0 0 o ... i+py 

With p = -p this is nearly the same as (50.9), but the difference is not negligible and 
(50.12) is not so easy to invert. Consider, however, (50.8) with P = — p, namely 

/ 1 ~P P 2 •■.(-£) w - 1 \ 

- I [ 1 ~P •••(-d) n - 2 \ 

1-/5M.. • (50.13) 

'(~P) n ~ 1 (~P) n ~ 2 ( — ft\n-3 •, / 

If, in (50.11), we modify the model sliefitlv sn tl-iaV 
Likewise if e n has variance 1-8% var u J 1 Vh* ,! ‘° ls zer0 > then var u, = 1. 

hence (50.13) represents the inverse of the dispersion mat* ^ f t ”* unaffected and 

which clearly is asymptotically the same as the scheme f SO m ^ m ° dified scheme ' 
terms have been altered. scheme (50.11) since only two end 

Thus, to this degree of approximation, the log likelihood is given by 
l°gL = const. ~!log(l-£ 2 \ _1_ / * 2 «-i g by 


2(1 -i 32 ) U-t u> 




( 50 . 14 ) 
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In this expression all the observed serial covariances are involved, ™J* are cons tants 
autoregressive scheme we need only as many serial covariances^ as are st iil left with 
to estimate. Even if we neglect the terms outside braces in (5 . ) eaua tions are 

a cumbrous likelihood function to manage, and in particular 
intractable. 

Example 50.1 . different 

It might be supposed that the difficulty could be overcome y 


J.L IIllgiiL uc buppuacu mat --- . £ tC{\ 1 1 \ 

estimator. For example, we have for the first autocorrelation ° ( ‘ ' 

P i = Mi+i« 2 )- 

It is plausible, then, to estimate ft by solving 

b 


1 + & 2 


= r x . 


(50.15) 


(50.16) 


But unfortunately, as Whittle (1953a) showed, this is a very inefficient estima 
In fact, for the asymptotic variance of r x (equation (48.9)) we ave 

n var r x = 2 {pf + pi-i Pi+i — 4pi Pi Pi+i + 2/>? pi) > 

i~— oo 

which, in our present case, reduces to 

“ var = 1 - 3 (t+?) + 4 (nT 2 ) ' (5 °' 17) 

From (50.15) we have 


dTl (1 +b*)* db ’ 

and hence, asymptotically, 

n var b = {(1 + / S 2 ) 1 - 3/? 2 (1 + 0 2 ) + W • 

For example, with = \ we find 

, 389 

» var b = ^44- 

Taking log L in the form 

— 2(i ZTjfi j ^ ^^ u j u j +1 + 2/3 2 S u.jUj +2 — ...} 

2 4 

= say > 

we find 

E(A) = 1-/3 2 


(50.18) 


(50.19) 


(50.20) 


- s" 1 - sa y> 


e C 4 ) = 

\ d P/ 

=o 

v/? 2 / 


a 2 , . i 

— logi = 


4jS 34 1 3 2 4‘ 

oa\ n ^ 1 4 r\c% ^ 


2L\(1-/?«)• (i-W (1-jS 2 ) 2 i-j3 2 
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whence / o 2 \ 1 

E (w loeL ) = 

and thus the variance of the ML estimator is given (cf. (18.60)) by 

« l -£ 2 

var P = — • (50.21) 

For s = \ this reduces to 3/4» and comparison with (50.20) shows that the esti mator 
b from (50.16) has a variance 3-6 times the optimum value. 

The result is unexpected but easy to understand. The estimator of (50.16) Use 
only the first serial correlation and forfeits the information in the other serials which 
as we have seen, all appear in the likelihood function. ’ 


Estimation in autoregressive series 

50.7 For the general linear autoregressive series 

k 

^o u t-o = E t ( 50 . 22 ) 

the same kind of argument as we used in 50.4 shows that, with the usual neglect of 
end-effects, the ML estimators in normal variation are given by minimizing 

S ] S cc-tifA 
t—k+l (j=0 J 

which gives rise to the Yule-Walker equations (47.66). Or equivalently, we can treat 
the estimation problem as one in ordinary regression. 

The basic theorem on this subject is due to Mann and Wald (1943) who proved 
rigorously that asymptotically the sampling properties of least-squares estimators d of a 
are the same as those of least-squares regression estimators in multivariate normal 
systems. This useful result is enough for most practical purposes. Experimental 
studies on senes generated by rectangularly distributed s’s, and for moderate length , 
of 60 terms indicate that the Yule-Walker equations can safely be used in such cases 

method—cf. 48.7 *° ^ eSUmateS ° f “‘“correlations for bias by Quenouille's 

to t^UemftLIoSst ST"? wh!ch wish 

order *+1. That is toT "weIssuti TTJT" ^ ° f ° rder * with one of 
we carry the regressions? From what has be^n said kTfflh aUt °' egreS . SlVe ’ how far do 
fitting extra terms in the regression until th»r • * b e T ldent that we can S° 011 
sum of squares. In fact autoreorp<!«i gf*-* . e ls no appreciable diminution in the 

because we do not have’ to face^he usual'pfoblem^fT 15161 ' ^ ordina ^ regression 
variables when the regress°ors C are oTm^pes” 8 ° f h ° W t0 rejeCt “ insi g" ificant ” 

Example 50.2 

^ a kle 45.4 we gave a senV<? nf fi m , r , 

Wales from 1867 to 1939 Fie 45 4 1CS °5 tbe s ^ ee P population of England and 

tig. 45.4 indicates that the downward trend in these figures 
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Table 50.1—Residual values of the sheep series of Table 45.4 after elimination of trend 

by a simple nine-point moving average 

V„. Residual v Residual Y 

Y f lo.ooo) Year ao.ooo) Y (io,ooo) 


Year 

Residual 

(10,000) 

Year 

Residual 

(10,000) 

1871 

-176 

1893 

+ 34 

72 

-112 

94 

-103 

73 

+ 50 

95 

-104 

74 

+ 141 

96 

- 15 

75 

+ 60 

97 

- 23 

76 

- 20 

98 

+ 17 

77 

+ 12 

99 

+ 71 

78 

+ 82 

1900 

+ 35 

79 

+ 130 

01 

+ 16 

80 

- 14 

02 

- 27 

81 

-166 

03 

- 32 

82 

-179 

04 

- 49 

83 

- 84 

05 

- 61 

84 

+ 38 

06 

- 52 

85 

+ 97 

07 

- 24 

86 

+ 8 

08 

+ 68 

87 

- 5 

09 

+ 141 ! 

88 

-105 

10 

+ 119 ! 

89 

- 99 

11 

+ 66 

90 

+ 35 

12 

- 52 

91 

+ 159 ! 

13 

-117 

92 

+ 167 

14 

- 61 


1915 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 


+ 19 
+ 128 
+ 97 
+ 69 

- 29 
-174 
-107 
-142 
-109 

- 23 
+ 60 
+ 121 
+ 94 

- 25 

- 90 

- 75 
+ 72 
+ 152 
+ 112 

- 64 

- 87 


is approximately linear. In Table 50.1 we show the residuals in this series after the 
elimination of trend by a simple nine-point moving average. We have to consider 
how far this residual series can be represented by an expression of the form 

u ( = f(u t _ 1 , u t _ 2 , •. 0 + e c (50.23) 


The first ten serial correlations are as follows: 


Order of 
correlation k 

rjc 

Order of 

correlation k ric 

1 

0-595 

6 

0-144 

2 

-0-151 

7 

0-203 

3 

-0-601 

8 

0-118 

4 

-0-537 

9 

0-006 

5 

-0-138 

10 

-0078 


We first of all consider what order of linear autoregressive scheme would be required. 
This is most easily decided in terms of partial correlations of u t with u t _ k eliminating all 
intervening observations, and the corresponding multiple correlations determined by 
(27.61). We find— 







w 

-4v, 
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— 

Value of 

k 

lag k 

partial r of lag k 

77(1 -r 2 ) = 1 -i? 2 

l 

1 

9 

0-595 

0-6460 

-0-782 

0-2509 


0-097 

0-2485 

J 

4 

-0-183 

0-2402 

T 

0-031 

0-2400 

mJ 

6 

0014 

0-2400 


There is apparently no appreciable gain in representation to be obtained by taking 
a linear autoregression of order greater than two. Note that the high values of | r i 
| r 4 | disappear upon partialling, whereas the small | r 2 1 is replaced by the largest 
partial | r |. 

We might, however, wish to examine the question whether curvilinear terms might 
improve the autoregression fit (even at the expense of rendering the model non-station- 
ary). This is most clearly decided by drawing the scatter diagrams of u t on u t _ x an ^ 
of u t on u t _ 2 , which are shown in Fig. 50.1 There is no sign, to the eye at least, of 
curvilinearity in this scatter of variation. We conclude that, so far as autoregressive 
representation is possible, it is adequate to take the Yule scheme 

U{ = — — cc 2 u t _2 + s t> (50.24) 

in which the variance of e is about 25 per cent (i.e. the value of l — R 2 above— 
cf. (27.56)) of the variance of u. 

The constants a x and a 2 are easily estimated using (47.77-8) as 


_ a _ (!-'■) _ i 

“l- 1 - o -1 

1 —r\ 


060 


— a 2 = 


r 9 — 1 


1 -r\ 

and the autoregressive equation is 

u t = l-060u(_ 1 — 0-782u ( _ z . 


4-1 = - 0 - 782 , 


(50.25) 


Test of fit for autoregressive schemes 

50.9 It so happens that for autoregressive schemes the partial autocorrelations can be 

obtained directly, a fact which was used by Quenouille (1947b) to provide an ingenious 
test of iiti 

Corresponding to (50.22) consider a variable rj t defined by 

Jc 

s XjU, +j = n„ 

where the u’s go forward, so to speak, instead of backward in time 
We have 


(50.26) 


c °v toe Vm) = E( S U M ) (S «,«, ) 


where y v is the ^>th autocovariance of u t . 






(50.27) 
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The second summation on the right vanishes in virtue of the Yule-Walker eo„ .• 

(47 M), for l > 0. The same result holds for 1 < 0. When / = 0 we find 4 ' 0 " 8 
1 var Vl = var e t . 

if e is a normal random variable, so is rj, with the same variance. v ^ -28 ) 
and previous s's and is therefore independent of e t+ic+i for />o. 1 epet *ds 


Thus, if e is a normal random vanaoie, so is ry, wiui me same variance, 
on £( +k and previous e’s and is therefore independent of e t+k+l for />o 
Consider now quantities q defined by 

1 n 

ft = - s ( £ t+i mb 


*! 




w <=i 


We have 


(50.29) 

(50.30) 


(50.31) 

(50.32) 


(50.33) 


ft’(ft) = o, />£. 
var & = E(# +J )E(rfi) 

= (vare) 2 , y>£. 
cov ($)•, <7 /+? ) = 0, / ^ 0. 

Define the quantities 

coy = qj/v ar u 

= «s £ '+i( a »“< + • • • +%«<«). />*. 

Then each <u y has zero mean, variance equal to (var e/var u)\ and is uncorrelnt^ , v, 
the other co’s. Lea Wlt h 

We observe that e t+j is u t+J - after removal by regression of the terms a 
u i+j _ k . On the other hand, rj t is u t after the removal of «,,, n ’'' ’ 

j> k the correlation between e, +; and „ namely is the parrid' ^“r^iion oAer ” 
in the series distance j apart. 1 01 terms ^ 

seria^oStt^ ” ^ ^ (5(U3 >’ Usil * Sa “P k values for the 

1 n 

fr • • • + “* «/+;-*) (*„«,+ . . . +a k u, +l ) 


°>i = 


n var u <= 1 

, . ., = A or j + A 1 r } ._ 1 + ... + A 2k r._ 2k> 

where the A s are given by 


Aj = 2 cc„. oc • • 

1 3 1 • 

3 = 0 


(50.34) 

(50.35) 


Example 50.3 

For the Yule scheme (47.74) we find, from (50.35) 

A 0 - 1, A x = 2ocj, A 2 = oc? + 2a A ~ o 
and hence, asymptotically, 2 ’ 3 2a i a 2 > A 4 = a |, 

is distributed wTthviilnce 1 + W++ ^ “* T *- + ^ r ^’ J > 2, 

1 n-«, 

In Exampie 50.2 we found, f„ r ^ of 6 f ^ ‘ (5a37; 

%=-1060 “ : r 0 m ; 8 2 . residuals in the sheep series ^ 


(50.36) 

(50.37) 
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' W „ 

Substitution in (50.36) then gives 

coj = r i — 2• 120^ _! + 2-688rj _ 2 — 1 -658^ _ 3 + 0-612r 3 - _ 4 , 

with variance 9-69 x 10~ 4 . 

Considered as estimators of partial correlations in series of moderate length these 
quantities are rather indifferent, being affected by sampling fluctuations or casual 
errors. They rest on the assumption, of course, that we can use sample estimators 
of the a’s. We find 

co 3 = 0 025, co 4 = -0043, a> 5 = -0001, 
with a standard error of 0 031, and reach the same conclusion as in Example 50.2, 
that a second-order scheme is sufficient to account for the data. 

Moving-average processes 

50.10 The difficulties we remarked upon in estimation of the constants in the 
pure moving-average process (50.11) are obviously intensified when averages of greater 
extent are concerned. Asymptotic expressions for the likelihood may be derived, but 
the ML equations are extremely cumbrous. We shall describe a method due to 
Durbin (1959b) which, in effect, turns the problem into one of autoregression. 

Consider, in fact, the simple model (50.11). This is equivalent to the infinite 
autoregression 

i&i g ... = Cj. (50.38) 

Compare this with the finite autoregressive scheme of order 

4 f 4 ... ^ t j (50.39) 

with <x k = (— P) lc . 

The difference lies in the remainder after £ +1 terms of the autoregression: 

(-py+i Ut _ k _ 1+ (-py c +2 Ut _ k z+ ... 

= (~P) k+1 £i-ic-v (50.40) 

The variance of this term is /? 27c+2 var e, and for | /? | < 1 this tends rapidly to zero as 
k grows larger. Consequently the representation (50.39) can be made as close to 
(50.38) as we like by taking k sufficiently large (but small compared to n ). 

Let a lf a z ,. . . ,a k be the least-squares estimators of a 1} a 2) . . ., a fc in (50.39). From 
the Mann-Wald theorem of 50.7, and (19.16), we know that the (a — a)’s are asymptotic¬ 
ally normal with zero mean and dispersion matrix equal to 1 /«, where V fc var e is the 
dispersion matrix of the regressor variables, namely of u t _ ly u t _ 2y . .., u t _ k . This is 
given by the matrix of (50.12). Hence for the asymptotic distribution of a x ,..., a k we 
have 


dF = 


*(i# “ P • • • da k . 


(50.41) 

The expression in curly brackets, say Q, is the essential part of the likelihood function 
since | V k | is (l-£ 2 *+ 2 )/(l-(? s ) (cf. Exercise 50.1). We can simplify Q to some 
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W ' THE advance T ^°, 66) in the form 

*“ -der the yule-W^ * q . +«* 

,*,ent. Cens‘J« tn • . = 


C e 1 yo+ c£27 ' 1 ^ ' + «*y* " 

flCl y 1 +« a n + ;* i . . . • 


(50.42) 


■y** 


Putting yo 


(50.43) 


.. 4. , . . 

^n-^ n 2 we have 

? »Vr», cr 2 - v are'Yi^P’ 

i 5«*_i+( 1+ ^ a * =S 1 &) in turn and adding, we find 

• • these equations by -2 aj + a j(J 

Multiplying t^se equ 

th e expression ^ ^ ^ *£ ( 50.44) 

. . _ -mifftn 


\ - ■ I J 1 - 

. /? this gives, on putting = 1, 

Since for large A, * is nearly equal to - fi t 

q = (i + p2)Za 2 j+2p2aj a j+i 1 - K • ) 


X ■ 1 'j=l 3=0 

u a;#* rpntiatin? 0 and equating to zero in the usual 
The estimator of fi is now given by differentiating y 4 

way, which gives us . k 

l=-'£a i a m ha% (50.46) 

i—o / 


50.11 This estimator is easily computed from the tf’s, which in turn are derivable 
without difficulty from a regression routine. Durbin (1959b) showed, moreover, that 
to the first order in « (cf. Exercise 50.6) 


var b = 


In the present case 


nE(d 2 Q/d(3 2 ) ’ 

WV '' 


(50.47) 


for ,afge * tends 40 2 ?J~W‘ = 2/0 -n Thus for sufficiently large k, 

.... 1-/9 2 


var 5 = 


and the estimator b is asymptotically efficient. 


n 


(50.48) 


50.12 Similar methods give accents , 

Without proof the main results. 




(50.49) 








# 


■tzw- ;t') 


*. 
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with the e’s independently and identically (but not necessarily normally)^ ^ ^ 

The asymptotic distribution of the least-squares estimators a o a is 

dF = ”*J exp (~\nQ) da, 

where ct 2 B is the dispersion matrix of u t ^ x , . . . > u t - k and 

Q = (a-a)'B(a-a) 

= a' Ba — 2a' Ba 4- a' Ba. 

This simplifies to 

Q = a'Ba + 2a'c-a'c, 
where c = (c lt ... , c k ) and = y^/cr 2 , 


(50.51) 

(50.52) 


and again to Q = a'Ba + 2a'c+ £ /?J. 

j = l 


(50.53) 


The estimators b oi [i are given by 


A; £ — i a;— z /c — /i-t-x 

£ &‘j £ Ctj &j+i £ Ctj Clj +2 • • • 2 ^ +7i —1 

y=o i=o o o 


*-i * 

0 0 


*-*+1 k - h +2 

S Uj ttj + j t —i S cij cij -i- k 2 .... % 

0 0 



h 


7c —1 

S dj dj q-l 

0 




7c-2 


b z 


S 77^* q-2 



— 

0 


• • • 


Jc-h 


b K 


2 4-7^ 


L J 

0 


The asymptotic variance matrix of 6 is approximately 


1 


E 


d*Q 


~ -1 
i- 


n L d Pi d Pj- 

which may be shown to be equal to U/n, where 

1 — fit fii — /5 7l _! fi h fi 2 “ fih— 2 fih • • • ft*- 1— ft 

fii-fih-ifih 1 + fif~ fit-i- fil 

U = I fiz — fih-lfih fil + filfi* — fih-2 fih -X _ fih -1 fih 


(50.55) 


\fih-X~fixfih 


A-fit 


(50.56) 


50.13 The foregoing results provide a basis for the construction of large-sample 
tests of hypotheses. For example, in (50.11), to test the hypothesis that fi — fin we 
calculate b from (50.42) and test 

*= V#--$)"* ( 50 * 57 ) 

as a normal deviate iV(0,1). Likewise, in the more general scheme (50.49) we test 

^ = 2^ “ft) (ft "ft) ( 50 - 58 ) 

£ = 1 j = l 

as a variable with h d.fr. Here (ir£) is the inverse of (50.56). 
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484 r r t £ t he whole model may be derived 

50.14 Again, a test of the goo w here <9 is given by (50.45), is distri- 

With the simpler model (50- 11 ) we n Sub g t itution in (50.45) from (50.46) shows 

buted approximately as % wi • • 

that Q can be partitioned m the fonn 

0 = (1 - 6 2 ) s a? -1 + iP-P) 2 a r ( 50 * s 9) 

• ii (h RV^a 2 is equivalent to the regression sum of squares in a linear 
Asymptotically n{b~P) *a, i 4 . idual sum G f squares. Thus the goodness 

regression model, the remainder being me 
of fit of the model may be examined by testing 

<&-Vt (50.60^ 


“ For the mteTxtended model (50.49) the minimum value of Q is given by 

k h 1 




which can be tested in % 2 with k-h d.fr. 

For details and some numerical results see Duibin (1959b). Wold (1949) had earlier 
suggested a more complicated test. Durbin proved for the second-order case, and con¬ 
jectured a general result, to the effect that the limiting dispersion determinant of the auto¬ 
regressive scheme 2 0 LjUt-j = £< is the same as the limiting dispersion determinant of 
° h 

the moving-average scheme ut = 2 oy £ t-j- This was proved by Finch (1960) and by 

o 

A. M. Walker (1961). If there is any simple explanation of this remarkable duality it 
remains undiscovered. 


Autoregressive schemes with moving-average errors 

50.15 Consider now the mixed scheme 

k h 

2 a i u M = . 2 &«W. (50.62) 

The problem of finding efficient estimators of oc’s and fi’s has not been thoroughly 
investigated, but the most promising method, due to Durbin (1960b), seems to be to 
iterate in the following manner. 

Suppose we have a set of values a of a. We can then transform the u' s to new 
variables z by the autoregressive transformation 

k 

z i = .^ 0 a J u i-J (50.63) 

and estimate the constants (3 in the model 

^ = 2 (50 64) 

Having determined estimates b of (3, we can now transform 

2 <Xj u t _. = 2 h. e 

to autoregressive equations of the form 

2 a j U l-j = e H 


( 50 . 65 ) 
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where the a' are linear functions of the a’s. We can then obtain estimates of the a s 
and hence of the a’s. The cycle can then begin again until the estimates of a s an 
/?’s converge. 


50.16 The problems of applying this procedure are twofold: to ensure that t e 
iterative procedure converges satisfactorily, and to find a good set of starting values. 
The first problem does not seem to have been thoroughly examined in time-series 
analysis convergence is not a property which can be assumed without extensive prac ica 
testing. As to the second, we recall that the scheme (50.62) can be closely approximate 
by an autoregressive scheme of large order. If we fit such a scheme, let the resi ua s 
be e t . In (50.62) we replace e t by e t and hence obtain preliminary estimates of a an p 
by minimizing 

3=0 3=1 


(50.66) 


Example 50.4 

Consider the model 


(50.67) 

(50.68) 


Ut+aUt-! = e t +fie 

Approximate by a scheme of order k giving residuals e t 

e t = + • • • + a k u t-k- 

We minimize 

E (i« l +oc«|_ 1 -/fe < _i)* 

to obtain for estimators of a and 

(xS Wj_ 1 = 0, 

Sh ( « h - aS ft 2 e\__ x — 0* 

Substituting for e from (50.68) and replacing S u t u t ^ by (S u 2 ) r p , we find the asymp¬ 
totic expressions 

+ ^ r * +1 (50.71) 


(50.69) 

(50.70) 


A a 1 r z + a 2 r 3 + 

a -- 


CLi + + 


fi = a +V‘ ,r,+ 


.. + a k r k 

+ a k r k+l 


l + a 1 r 1 + . 

From these the iterative solution may begin. 


+ a k r k 


(50.72) 


50.17 The mixed scheme (50.62) with random values e has the autocovariance 
generator (cf. 47.15 and 47.18) 

gw = (50.73) 

v ’ (2 (XjZ 5 ) 

and the corresponding spectral density is given (cf. 47.14) by 


" (2a^)(2a je-^Y 


(50.74) 


From estimates of the a’s and jff’s we can then determine the estimated spectral density. 
It is more relevant, perhaps, to consider whether the a’s and £’s can be estimated from 
the observed spectrum. The question has been examined by Durbin (1961). 


m the advanced theory of statistics 

ea jo A more general assault on the problems of hypothesis testing and esti mati 
has been made by Whittle in a series of papers, particular y 951, 1953a and 19 S3b 
The mrthods, based for the most part on Maximum Like .hood considerations, are 
Siting and the results of considerable generality for stationary time-series 
Snuous spectra. They are, however, not very suitable for numerical work. 

Some multivariate extensions 

50.19 Suppose that we have p series, u if i = 1,2,... ,p, observed at n intervals 
of time or, in the continuous case, defined over a certain time-period. The value of u , 
at time t will be denoted by %, and the set of pxn values of u by a matrix u. As i* 
the univariate case, we regard this as the realization of a process, and our basic object 
is to determine what kind of a process it is and what are its parameters. 

In general, any row vector of u, considered as a single series, may contain trend 
seasonal, or oscillatory movements. However, it makes for almost unmanageable 
complication to try to dissect each vector simultaneously into its constituents. We shall 
assume that trend and seasonal movements have been removed, leaving us with a multi¬ 
variate stationary complex u. We follow Quenouille (1957). 

50.20 The covariance of u it and u jt t _ s will be written y (ij)s and the corresponding 
correlation by p m . The analogous observed quantities are c m , r (ij)s . For any given 
s there are ip(p + 1) of these quantities array able in a square matrix which we write 
Yst Ps> c s or r s as the case may be. In the univariate case p s = p_ s) but clearly 
E(un u j t t+s) is n °t equal to E(u it u 3i t _ s ) but to E(u it t _ $ u Jt ). Hence we have 

Ts = Y s '. (50.75) 

As usual with multivariate extensions, the number of parameters and estimators 
increase rapidly with p. We shall refer to y m as the cross-covariance of u, and u . for 
logy. Likewise Pm is a cross-correlation. Where necessary to distinguish between 
sample and parental values we shall use those words, although they may often be 
omitted when the symbols themselves make it clear which is under reference. 

50.21 As m the univariate case, we shall be concerned with three types of model 
autoregressive, moving-average, and mixed autoregressive-moving-avLge systems’ 
L*e i( be a series of independent random elements. Corresponding to th! univariate 

“i- *i- s 

we now have 5 0 

, ““ = .?0 '-*■ (50.76) 

and a corresponding autoregressive scheme 

k 'p 

..... „ , 5 0 j-i Xiis (50.77) 

Writing D for a shift operator such that Du, = „ 

l U t-l> 

= ( 50.7 


we may express (50.76) in the for: 







TIME-SERIES: SOME FURTHER TOPICS 


in matrix form. 


where B(Z)) = 2 B S D S . 

5 = 0 

Likewise, for the autoregressive scheme of (50.77) we may write 


where A(D) = 2 A JD 3 . 

s=0 

We may write the solution of (50.82) 

u, = A -1 (D)e ( . 

The terms in A and B are polynomials in D. We also define 


Y = S Y S D*. 

$ = — 00 


(50.84) 


50.22 Without loss of generality, we will choose the scales so that the e* have zero 
mean, the same variance for all i, say a 2 , and are uncorrelated. Then we have 

Ys = £( u X-s)- 

Substituting from (51.79), we find 

Ys - 

= E 2 Bj- E(Zf_j e ,_ s _ 
j =o m =o 

= r 2 2 B s +m B m 

s, m 

= coeff. of D s in u 2 (2 B 3 -D 3 ) (2 BjZ) -3 ). (50.85) 

Hence we may write 

y = B(H)B , (D- 1 ). (50.86) 

Likewise, for the autoregressive equation 

Y = (50.87) 

It is easy to show that for a mixed scheme 

Ail* = Be, 

Y = A-^^B^B^H-^A'-^D- 1 ). (50. 


(50.88^1 


These are the multivariate analogues of (50.73). In them the D’s may be regarded 
as dummy variables equivalent to what we have formerly written as 2. Equations 
(50.86)-(50.88) are, in fact, covariance generating functions. 


50.23 For the autoregressive scheme we have natural generalizations of the Yule- 


II 
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the ADVANCE k) by u ' J+a and taking expectation 

/ \ 


488 tin 66) Postmu lt, P 1 y ing , . 

Wa)k er equations (47. > , = S A ^(u,.,u 1+ e) 


s==o 


Ik 


The solution of these equations 


is not, 


# a 

=: S A 5 Y^+5 * 

however, an easy matter 


<1> 


0. 


(50.89) 


, ,, „ rdinary problems which we encountered in the univariate 

from the ordma y P fof multivanat e senes 


50 24 Apart from the ordinary Fy-— multivaria te series, 
case! there are two further eompto ^ re , ations am ong the variables, in whirl 
In the first place, there may ex Steps must then be taken to removi 

case the matrices A or B may become u g 
some of the variables 

Example 50.5 (Quenouille, 1957) 

Consider 

U U = £ U + £ 2, t- 1 
W 2/ = £ l/ + £ 2/ 

Wat = £ 2/ + £ 2, t-i 


(50.90) 


We have 


and from (50.86) 



(50.91) 


Y = B(Z>)B'(Z)-i) 

/ 2 1 + 2 ) 1 + 2 ) 

= ( 1 + 2)- 1 2 1 + Z )- 1 

ll + Z )- 1 1 + 2 ) 2 + 2 )+ 2)- 1 


(50.92) 


It is then found that f y | = 0 and the matrix has rank 2. There must then he a 
linear relation among the variables. In this case we can a1™ 0< - a t • ? . be 3 

tion, but formally we should look for the zero l«7m f? de , termlne “ b T lns P«- 
vector. The latter is proportional to 1+D, -(1+Z)) 7° n,Vil lts associated latent 

ri.ni,, (‘+^.l--Oand the relation is therefore 

(1 +D)u u -(\ +Z))h 2( + (] ~D)u u = 0 


or 


= -“i,r-.+«a l( -i + « w . l . 


(50.93) 


CAOC Th 

Degeneracies are, in practice th a 

they occur can be dealt with fairly eSv 1 ^“°" rather than * he and whe. 

portam, and more difficult to deal with, i hetctTat" ^ ExampIe 5 °' 5 - More 

Writ . Un ' quel >'’ howev er good our let h equatlon 50 - 88 ) does not deter- 
Write temporarily F for A-B. Then AnSw' 0 ^ ° f * 

v - pmiro 18 equiv alent to 

Y = F(D)F'(2)-i). 


(50.94) 
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If ^ is any diagonal matrix for which the *th diagonal element is <£<V* 
any diagonal matrix with diagonal elements y>i(D)/ipi(D~ 1 ), and if J is any matrix sue 
that JJ' = I, it is easily seen by substitution in (50.94) that F (D) ^(D) may rep ace 

F(£>). Thus, many different schemes may give rise to the same covariance matrix. 


Example 50.6 (Quenouille, 1957) 

Consider the matrices 

F (2 + D D \ 
Fl= ( 1 6 +D)’ 

F = 1/ 2 + 7 D 4 + 9D\ . 

3 5 \ —11 —6D 28 + 3 Dj’ 

It can easily be verified that 

l-FVI = (4+B)(3+Z>), 

\F* \ = (4 + D)(l+3fl), 
and that for each F 

(6+2D + 2D- 1 
Y l, 3 + 7D- 1 


/I+2D -D \ 

~ \ 5 3 + 2 DJ 

1/14 + 37 D -5-12 D' 
17^55 + 30.0 1 + 84 D j 


F 2 I = (3+Z))(l + 4D) 
F l | = (1+4D)(1 + 3D), 

3+7D \ 

38 + 6D + 6D-7 


‘ V 3 + 7D- 1 38 + 6D + 6D-V 

Furthermore, if we postmultiply the F’s respectively by orthogonal matrices Ji, J a , J3, J4> 
y is unaltered. 

50.26 It remains for consideration whether all the possible solutions of (50.84) 
are acceptable; for example, whether they all provide stationary series. So far as is 
known, some fairly stringent conditions must be imposed before we can derive a unique 
solution. The following treatment is due to Phillips (1959). 

We consider the mixed scheme with independent residuals and assume (1) that A 
is non-degenerate (which we can always ensure as in Example 50.5) and (2) that j A | , 
a polynomial in D, has different roots A lt Z 2 , ... , 2 m . Then if <*(£)) is the adjoint of 
A (D). we may write (50.88) as 

_ gggg: 1 ) (50.95) 

Y_ \A(D)\ M'(ZMI 

M 

Expressing [ A | as the product II (D-Jl,.), we see that the right-hand side may be 

r = 1 

expressed in partial fractions 

m V m IT' 

where K r is a pxp matrix given, according to the usual theory of partial fractions, by 

_ f(Z)—2 r ) a(Z)) B(Z)) B ' (D ~ 1 ) a ; (D ~ 1 ) ~j . 

Kr ~ L i^pjTKF)! 1 * K } 


D=Ar 


In (50.96) we do not want terms in positive powers of D, which implies the condition 
that | B | is of lower degree than | A \. 

For a simple root X r the matrix A(A r ) is simply degenerate, and its adjoint a(2 r ) is 
of unit rank—a known result in matrix theory. We may then write 

a(A r ) = k r K r , (50.98) 



rE „ THEORY OF STATISTICS 

-the ADVANC a (lxp) r0W vector satisfying 

** k i s a (?xl) co,umnVeCt0r lS)^ = ° 

' k,A(2,) = 



where hr 


( 50 . 99 ) 

(50.100) 


JC / / 

. V For if we define a (1 x*) row vector 1„ b 
In point of f- K, ** * -<*«») 


we find on s 


mbstituting (50-98) m W 97 J^ 


101 ) 

(50.102) 


Now from (50.99) we have 


art IK = 0 r = 1> 2. • • • > **• (50.103) 

{ r • ' ra pvnress it in partial fractions and hence deter- 

Given, then, the set of equations (50.103). The question is 

mine K r and V . We “V* determine the coefficients in A uniquely. 

Whether this set is enough to determ 

c r ij rUp case when all the scalar equations in Au^ = Be^ 
50.27 Consider first o JJb* ^ ^ lower order . W e now impose two 

are of the same or er ^ elements j n the leading diagonal of A are of degree v 

but'ffiatmn-diagonal elements are of degree v-1 at most (this means among other 
but that n g zero); / b ) that the elements of the corresponding row in B 

areoflower degree than v (this means that no terms in X arise as numerators m (50.96)) 
Without loss of generality we may suppose that the coefficient of D v in each diagonal 
term in A is unity. | A | is of degree pv which is therefore equal to m. Any given 
row in A then has pv coefficients to be determined, and equation (50.103), for m = pv 
values of r, provides a set of non-homogeneous independent equations. Thus the 
coefficients are uniquely determined. 

When A is determined we find B(Z))B / (Z) _1 ) from 

_ 2 2 WWD3' (50.104) 


• / 

which is derived from (50.95) and (50.96). There remains an indeterminacy for B 
itself. This can be resolved only by extraneous information. 

50.28 If the equations in the system are not all of the same order we require still 

^ A ' Let the e, J uations * Au, = Be, be arranged 

a ^ d « any -% ub r quent equation is not of 

where (t is an arbitrary matrix of coLtants We ^ (5 °' 88) ’ S ° d ° ^ a " d ^ 

without violating the condition that th ’ .. C Can add ar W row of A to a later row 
the diagonal elements. But we cannot^dd !. aSOnal eIer ^ ents be of lower degree than 

be a triangular matrix with zeros above tht diagonT^ ^ ThUS ^ Can ° n ' y 
We can make the system identifiaKU ’t ^ na * 

* are not correlated from one equation f ^ “t pre P ared to assume that the elements 

or then B(Z))B'(Z)-i) i s diagonal, and hence Vhe l ° S2y tHat B ' S dia ® ona1 ' 

na hence the non-diagonal elements of fiBB'fjt' 
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are zero. Writing b u for the diagonal elements of BB', the non-diagonal elements 

jxB(Z?) B'fZ) -1 ) jx' above the diagonal are found to be 

ftl! bilfai ^11^11^31 /^ll 11^41 • 

^2i “i“ /^22 ^22 ^32 j^21 ^11 /^41 d" ^22/^42 

' . 

• • 

• • 

Since cannot vanish, // 21) ^gj, etc. must do so; and hence p , 32 » /^ 42 > etc.; and so on. 

Hence [/. is diagonal and the equations are identifiable. 

50.29 It will be evident enough that the problems associated with identifiability 
are formidable. We can, at least on a heuristic basis, estimate the covariance matrix y 
and hence the product A~ 1 (Z))B(Z)) B'(Z)- 1 ) A' _ 1 (I> -1 ). To proceed thence to the 
individual coefficients in A and B requires conditions on the problem which are not 
always easy to verify; and in any case appeal to extraneous knowledge is sometimes 
necessary to reach determinacy. One of the major outstanding problems of multi¬ 
variate temporal systems, in fact, is to ensure that a model is unique; and this apart 
from sampling considerations. 

Cross-spectra 

50.30 Just as we may consider the cross-correlations of series and obtain what 
might be called cross-correlograms, so we may examine the extension of spectrum 
analysis to the simultaneous variation of series. 

For any pair of series, say u x and w 2 > we have a set of cross-correlations 12 ) S > 
s = — oo,..., oo and, in extension of the spectrum of a single series (47.21), may define 
a spectral density 


w J2 (a) = S /> (12)s exp (/six). 


(50.105) 


There is a corresponding spectral function W(a.) defined over the range 0 to n. 
Conversely, as at (47.23), 

P(12)S = (1/w) 2 zv 12 (a) exp (-/six). (50.106) 

In univariate formulae, owing to the symmetry typified by p s = p_ s , sine terms 
disappear from expressions relating spectral density to covariances or correlations. In 
the multivariate case, p (12)s is not the same as p (12) _ s . Expansion of (50.105) gives us 

oo 00 

'i 2 C a ) =1+2 {p(i 2 ) S cos soc+p( 12 )(-s) cos six} + i 2 {p(i 2 ) e sin sa —p (12 )(- S ) sin sa) 


w 


= c(cc) + iq(v.), say. 


(50.107) 


50.31 The quantity c(a) is called the co-spectrum or co-spectral density. q( a) is 
called the quadrature spectrum or quadrature spectral density. Sometimes both these 
quantities are plotted against a. The sum of squares c 2 + q 2 is called the amplitude 
of the spectrum. The standardized quantity 

r< \ - c 2 {(c) + q 2 ((x) 

»!(«)»,(a) ’ (50.108) 

where w x and w 2 are the spectral densities of u x and u 2 , is called the coherence. 
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I 

V 1 ' 


,. . th - opj-ies are studied by three types of diagram: tho 

Phase relationships in the senes are 
diagram , plotting ¥»(*) a S ainst a ’ W /?(«)! 

y(«) = arc tan j» (50.109) 

the Arsand diagram, which plots r(«)M («) as abscissa against q{<*)/w x (a) as ordi nat 
*dtf diagram, plotting *M«) against « where 

D2 , v ^(a)C(a) 
i? ia(«) = (50.110) 

A good computer programme will calculate and graph the quantities required. (50.108) 
and (50.110) are analogues of correlation and regression coefficients. Some further 
details are given by Granger and Hatanaka (1964). 

50.32 For multivariate series of the autoregressive or moving-average types there 
is a straightforward generalization of the relation between the covariance-generating 
function and the spectral density. We have, in fact, 

«<«) = (50.111) 

but this is not, in practice, a very useful formula. 

The generalization of spectra and cross-spectra to polyspectra for ^-dimensional 
time-series is discussed by Brillinger (1965). 

Example 50.7 

To give some idea of what cross-correlations and cross-spectra look like we take an 
artificial series constructed by Quenouille (1957), reproduced in Table 50.2 Thp 
series was constructed from " ’ 6 

u u = u i,/-i-0-lu 2il _j + e 1( 

u 2t = 0 + t-i-Q-2uz tt _ x -\-e z A ( 50 . 112 ) 

u 3t = 0-9u 2i *-i + %. 

The e’s are rectangular random variables ranging from -49 to +49. 

ST* I 

of the three series and the logarithms of th mS s P ectra l ordinates 

series are effectively Markovian. Their 6 amplltude8 °f the cross-spectra. The 
as the schemes themselves. S s P ectra ex hibit much the same pattern 

50.33 It hardly needs to be stated tw ,, 
testing for multivariate series are much more rn° v™ °! estimation and hypothesis 
which themselves, as we have seen are far f mp h . cated than in the univariate case, 
Scrutiny of the corivlno™ J * * from slm P le - 

suggest whether the series is sUtiJnZ" Series wiU usuall y 

n“? C ‘ e0r , whether s °me more elaborate Scheme is likel y t0 be 

The basic elements used in deciding stmh m3y ^ re< l uired t0 explain observa- 

“eciding such questions are the serial correlations 


vL 


3-4 . 
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for the series of Table 50.2 
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f V and a. toc ,he serleS “ laDle 50 :2 
Table 50 . 3 -Values »■ de , erminanK of the corresponding matrices 

(Values underlined^Jhe vata^-- ~ " 


■7 878*14 4,392*92 
14*392*92 6,191*72 

r %%£ 85gl 

U’935 63 5,572-55 4,509-4lJ 
1-4106 Xl0 10 

-6 943-89 3^21703 2,501-31n 
*251-32 4; 6 50-15 3,166-8 
.454-67 5,010-45 3,546-13J 
3-8087 xlO 9 

•6,418-76 2,752-01 2,184-68 

303-69 3,790-42 2,602-81 
,726-19 4,185-14 2,849-74. 

1- 0283 xlO 9 

5,888-39 2,372-97 1,924-39 
,169-59 3,085-29 2,184-82 
773-33 3,411-38 2,342-53 

2- 7765 x 10 8 

5,371-43 2,064-44 1,705-91 
,915-27 2,536-47 1,866-94 
652-63 2,776-76 1,966-34 
7-4966 xlO 7 


T 6 ’ 

5 ' 

U, 

r 6, 

5 , 

u, 


ft 

U, 


L4, 


:] 


r 6 > 

4 , 

l_ 3 , 


3,511-74 
5,573 


•74-| 

•84 

•75J 


r 6, 

L3, 

r 6, 

5, 

U 

h rs, 

k 

J l4, 

1 r 5 > 

: 4, 

J l_ 4 , 


6,896-05 4,413-37 

,413-37 6,625-07 
,511-74 5,573-84 6,161-75 
3-8320 XlO 10 

6,487-04 3,877-69 3,097-75 
822-82 5,984-09 4,624-31 
,982-18 5,944-23 4,979-27. 
8-240 2 xlO 9 

6,053-67 3,384-15 2,784-75 
,008-42 5,175-63 3,681 

,421-45 5,315-40 3,925 
3-4807 xlO 9 

5,589-50 2,865-88 2,292-22 
877-75 4,224-71 2,932-37 

510-29 4,549-42 3,332-35 
2-6326 xlO 9 

■5,128-65 2,388-92 1,928-00' 
698-35 3,511-09 2,359-65 
376-84 3,819-96 2,824-63- 
2-5781 xlO 9 


3 

- 75-1 

•88 

• 26 J 

3 


r4,602-27 2,096-25 1,825-86 
4,549-36 2,770-88 1,835-51 

1-4,211-30 3,140-85 2,009-68 
9-1749x10® 


3 
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Order of 
correla tioi 

Correlations (decimal points omitted) 

n Series 1 
auto 

Series 2 
auto 

Series 3 
auto 

2 leading 1 

3 leading 1 

1 leading 2 

3 leading 2 ' 

l leading 3 ! 

0 

1000 

1000 

1000 

598 

459 

598 

855 

459 

1 

933 

898 

787 

677 

551 

512 

929 

389 

2 

860 

769 

600 

714 

635 

434 

825 

341 

3 

782 

617 

496 

697 

655 

354 

697 

258 

4 

707 

500 

404 

671 

638 

278 

574 

194 

5 

617 

377 

257 

650 

614 

229 

457 

177 

6 

525 

276 

169 

603 

612 

190 

309 

127 1 

7 

436 

201 

113 

544 

557 

146 

213 

080 

8 

353 

159 

080 

445 

476 

102 

152 

072 

9 

281 

093 

054 

351 

392 

072 

126 

031 

10 

227 

053 

016 

287 

304 

033 

066 

-017 

11 

201 

-014 

-026 

233 

266 

-012 

034 

-058 

12 

174 

-090 

-099 

155 

219 

-070 

001 

-085 

13 

117 

-177 

-116 

086 

144 

-116 

-046 

-143 

14 

068 

-258 

-110 

015 

083 

-172 

-091 

-216 

15 

005 

-316 

-162 

-036 

031 

-252 

-159 

-271 

16 

-045 

-361 

-260 

-096 

-006 

-318 

-218 

-305 

17 

-110 

-404 

-276 

-120 

-059 

-362 

-253 

-357 

18 

-179 

-419 

-271 

-137 

-063 

-416 

-270 

-425 

19 

-259 

-438 

-234 

-192 

-049 

-471 

-276 

-474 

20 

-323 

-462 

-330 

-279 

-094 

-516 

-291 

-493 

21 

-381 

-459 

-382 

-345 

-159 

-535 

-358 

-501 

22 

-434 

-535 

-414 

-391 

-214 

-545 

-411 

-497 

23 

-470 

-548 

-401 

-431 

-259 

-538 

-448 

-507 

24 

-467 

-531 

-406 

-485 

-321 

-517 

-459 

-465 

25 1 

-473 

-476 

-397 

-501 

-345 

-476 

-449 

-457 


2 leading 3 


855 

693 

534 

408 

307 

218 

161 

113 

059 

023 

-025 

-116 

-179 

-232 

-283 

-350 

-374 

-388 

-397 

-412 

-429 

-482 

-493 

-485 

-444 

-407 



Fig. 50.3—Amplitude of cross-spectra of the three series of Table 50.2 

The ordinate is the logarithm to base 10 of the cross-spectral density divided by the 
square root of the product of the variances of the corresponding series. 
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° f S mant takes over the role of autocorrelation determinants. 

be based on been developed and we must be 

sampling theory ^'“^ge/with somewhat imprecise, though intuitively r eKor ^« 

procedures. 



& 


ocedures. 

. . tll „ n w ith a consideration of the covariance determinant 

50.34 Let us begin, th ^ averages, all such determinants vanish W 

I y I. If the scheme is on autoregressive, there are relations bet™ 01 

Lml value Of r, say, “ £ .Jscheme is of Markoff type,^ 

,he successive matrices Y- , n | = p, y„ | (50.1^ 

(cf. (47.72)) where ,__|^|/| A 

and if it is of the Yule type (cf. Example 47.8), 

1 7s+l 7 s __ ^1 v ' 

7s 7s-i 

A I/I A 


V 


(50.113) 

(50.114) 


7i 

7o 


70 
/ 

71 


(50.115) 


where /?-1 A |/l ^oj • _ (50.116) 

Unfortunately these relations do not work well in practice because of the high degree 
of sampling variation which obscures the true facts. 

For example, with the series of Example 50.7 the values of | c s |, and those of | y s | 
for s = 0,... 5, are 


0 

1 

2 

3 

4 

5 


7s 

5-225 xlO 10 
1-411 xlO 10 
3-809 xlO 9 

1- 028 xlO 9 

2- 777 xlO 8 
7-497x10 7 


ks 

3-832 xlO 10 

8- 240 xlO 9 
3-481 xlO 9 
2-633 x 10 9 
2-578 x 10 9 

9- 175 xlO 8 


The values fluctuate too much to provide a very clear guide. 


„ *°4 5 , A further P° ssiblllt y is to consider the ratios For instance, with a 

Markoff scheme we should expect the sequence of the determinants of such values to 

considerably ^ *° (50 ' 113) ' * SeemS ’ however ’ that the y fluctuate 

W ° rk ” reference may be ™ d e Bartlett and Rajalaksh- 

to the multivariate 6 cTe n0graP Y Quenoullle ( 1957 )> who generalizes the test of 50.9 


Systems of equations 

specification in terms^oT a mT of ^ m ° deI ° f 3 system we are usua,1 y led 1 

simplicity we shall suppose that these retobnTaTelnea 011 ^ °1 

ons are equations (and not, for exam 
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inequalities) and that they are linear. In practice this latter condition is not so reS *T^ ma _ 
as it might appear; sometimes we can get rid of curvilinearities by variate trans oi^ ^ 
tions, sometimes a curvilinear relation can be replaced by linear ones in t e way 
curve may be replaced approximately by segments of straight lines. 

50.37 Outside of the physical sciences, exact mathematical relations of a deter 
ministic kind are rare. In the typical situation we have linear relations among varia es 
which are inexact in the sense that error terms are present. For example, const er 
the simple assumed relation between two observed variables y and x 

y-t* < 50 - 117 > 

This may be inexact for at least three reasons: (1) the relationship between y and # 
is not linear; (2) the observed variables are subject to errors of observation, in. w ic 
case the true relationship applies to unobservable variables r] and (3) the relation 
is exact as far as it goes, but there are other variables also influencing y and the correct 

relation is . 

y = (}x + e, (50.1 ) 

where e, at this stage, merely stands for something unknown which we cannot specify 
more explicitly. 


50.38 Equation (50.118) is a structural relation among variables which are not 
necessarily stochastic. It is not a regression equation. However, when faced with 
such relations in practice it is not unreasonable to postulate that e behaves like a random 
variable, and to depart from that assumption only when evidence about the actual 
* behaviour of e is accumulated. We shall, moreover, assume that variables y and x 
are not subject to errors of observation. 

We are thus led to consider systems of equations of linear type which do not in¬ 
corporate errors of observation but do incorporate a stochastic element. Our object 
is to use the observations to estimate the constants in these equations and the variances 
of the stochastic terms. We have already considered some systems of the kind: 
(a) regressions with independent errors, (b) autoregressions with independent errors, and 
(c) autoregressions with moving-average errors. We proceed to consider briefly two 
other types: (d) regressions with autocorrelated errors, and (e) mixed regressive- 
autoregressive systems. 


Regression with autocorrelated errors 

50.39 This case appears to have been first discussed in any detail by Cochrane 
and Orcutt (1949), who pointed out that least-squares estimation was not free from bias 
when the error terms were correlated. A test for the existence of such correlation 
was provided by Durbin and Watson (1950-1). Exact results are difficult to obtain, 
but Durbin and Watson set up a test statistic which, in effect, falls between two other 
statistics, each of which follows R. L. Anderson’s distribution (48.8). See also Watson 
(1955) and Watson and Hannan (1956). 


50.40 Consider a regression of y on fixed x*s, 

yt = 01*11+ • • • + t = 2 , 




w, 


• • i 


(50.119) 
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where u t is written instead of the usual to denote that it is autocorrelated. 
may often, without serious error, represent the autocorrelation structure of „ by ^ 

ing it to be autoregressive: 

m 

£ XjUi-j — £(- 
0 

(50.120) 

If the a’s were known we could transform (50.119) to 

- _ 7- Jr 

r 

h Q n> ,X/ 

2 a j yt-j = 2j /?/ S ocj x l)t -j+ 2 <Xju t _^ 

j = 0 J 1=1 j =0 J = 0 

(50.121) 

namely to 

q 


y't = s xi t +s t 

T . 

(50.122) 

K 

where y[ = 2 a^y^p 

j=0 

T, 

(50.123) 

K 

x't = 2 XjX ltM . 

j=0 

(50.124) 


Equation (50.122) is now an ordinary regression. Cochrane and Orcutt (1949) to 
whom this so-called “ autoregressive transformation ” is due, suggest guessing values 
of a, estimating /? from (50.122), and iterating the process if necessary by recalculate 
residuals and finding a further approximation to the a’s. 8 

Durbin (1960b) has proposed an alternative procedure which yields asymptoticallv 
efficient estimators. Writing y u = 0 t ap we put (50.121) in the form 

(50.125) 


y ^ °v yt-j — ^ yij %u-j+£{■ 

i, 5 


If the y’ s were independent we could, as indicated below, regard this asymptotically 
as a regression of y t on the other y’s and the *’s, and derive least-squares estimators of 
^ ^ corres P° ndin S estimators of a, 0 , y are a, b , c we have, in virtue of 

k 

y, +^S a,y,_i - 2 c u x K = y,+Sa, u,_, - (50.126) 

Hence the a's and (c-afi)’ s are least-squares coefficients of regression on u. ■ and 
Cogently the quantities a,-*, and are asymptotically Urmal 

thefr STT “a ascertainable diversion matrix. We can therefore write down 
their likelihood and maximize it to obtain estimators of a and ji. 

««0)-but tests of hypothesesTre fmpahed ' > “ d L ’ “ d T ' W ' 

Mixed autoregressive-regressive systems 

fixed**!: C ° nS ' der now the case where an autoregressive set of /s is regressed on 

= yfiitxi+s,. 


,l\ 

* 

I. 

\ 

1 


( 50 . 127 ) 
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We can express this in a form similar to (50.126): 

yt = - S 0i j y l _ j + i p u x t + B t . (50.128) 

z=i 

However, this is not a regression with fixed variables on the right, owing to the appear¬ 
ance of the lagged y’s. Durbin showed (1960a) that asymptotically the properties of 
least-squares estimators in such a system are the same as those without lagged variables, 
whether or not the residuals are normally distributed. This is a natural extension of 
the Mann-Wald theorem mentioned in 50.7. 

50.42 We shall not have the space to develop any further the theory of estimation 
and testing in statistical models, a subject of major importance which is full of pitfalls. 

Some general comments may, however, be useful. 

(a) It is important to remember which variables are being treated as “ fixed ” and 
which are, by their own nature or by the way in which the model is written, 
stochastic. This is particularly true when equations in these variables are being 
manipulated. For example, if we denote the random variable by a lower-case letter 
and a fixed variable by a capital, the regression 

y = pX+e (50.129) 

is not the same thing as 

* = 1 y-k (50.130) 

p p 

(b) The point becomes of particular interest in time-series wherein the same variable 
U( may occur in lagged form u t _ ly u t _ iy etc. In the equation 

u t = put-i + e t (50.131) 

we should usually regard both u t and u t _ x as random variables. However, at time £, 
u t _i has already occurred and is known. It is thus not random in one sense; for 
example, if (50.131) is regarded as a predictive equation, we are interested in the 
conditional variable u t \ u t _ ly not the joint distribution of u t and u t _ x . 

(c) It will be clear, and was forcibly brought to notice by Haavelmo (1943), that 
estimation of the constants in a subset of equations, instead of the whole set, may 
result in bias. Thus there is always a further source of error in estimation which 
must not be forgotten—we may have omitted part of the model. 

(d) The nature of the data available sometimes leads to the specification of incorrect 

models. For example, the demand for a commodity influences its price, and its 
price influences supply. But to write 
P d t = f(Pt) 

Pt = g( s t) (50.132) 

overlooks a fundamental property of the system, in that there may be a lag before 
a change in one variable affects the other. The lag may be so short that its effect 
does not appear in any statistical evidence we are able to collect; but to ignore it is 
to destroy the utility of the model. 

50.43 A Scandinavian school led by Wold (1964) has insisted on confining economic 
models to what is known as the “ causal-chain ” approach, and much of what they have 
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to say is relevant to the general problem of analysing dynamic systems. The ph e 
menon under study is conceived of as a chain of causation. A behaviour va ’ 
(observable) is subject to causal influences specified by a number of explanatory vari ki 
and is influenced by other behaviour variables only through the explanatory s GS 
Theoretically, perhaps, relations expressing the dependence of behaviour variables^* 
explanatoiy ones should be lagged in time. But when this is not possible the equat' 011 
are to be regarded as asymmetrical and read from left to right, e.g. the dependenc ^ 
price on demand, say the simple linear equation ot 

• , • . m , • P = xd ’ (50.133) 

is not invertible to give 1 

1 

d = «^* (50.134) 

The literature on model building is scattered, inadequate, and incomplete. That 
on the statistical analysis of models is worse. A monograph by Fisk (1966) gives 
useful account of problems associated with sets of equations. For the causal-cha' & 
method see the collection of papers edited by Wold (1964). In 

Forecasting 

50.44 One of the main objects of time-series analysis is to be able to predict the 
behaviour of the system under study over some future period of time; or at least to 
be able to see whether prediction within acceptable limits of error is possible. ’ 
Two approaches to the problem are available. In the first we adopt a purely 
statistical approach; the past behaviour of a series is studied, and on the assumptiJ 
that the generating system is constant an attempt is made to project the serie/into 
the fiiture without a detailed study of the generating system itself. Thus, given an 
autoregressive series and having estimated its constants, we may write, for example, 

• . , (50.1351 

and estimate u t , (a) by substituting the known values of u- and u in tbic 

and (b) by assuming that the best estimate we can mak ‘ol he ttu ban ZTX 

s rr :;r r,ir s? tr , ™ u -is 

estimate of ' d We may put “"^ence intervals round the 

l 

r- p, r <•> **«- 

approximation to the effect of the trnp c^n C , ° tbei chosen scheme) is a good 

anism is not mGchamsm > and ( b ) that such mech- 

position that we may use the equation based Clangmg enou gh to impair the sup- 

in the future. If, however we wish fn ,1 ? pcne, ! cet0 represent its behaviour 

ator, we must set up a model • that is to saJ 6 m ° Ie e6P y mt °. tbe nature of the gener- 
the relationships which condition thp m ^ W ? f 1USt tr ^ t0 Wldte down in specific form 
exercise, involving on the one hand a m ° h° n ° f ^ . SyStem -. This is a more complicated 
work, and on the other hand a lot^ r J. ater insight into the causal mechanisms at 

involved. The tendency has been for sTf f" 1 “ the various quantities 

y been for statisticians to prefer the simpler approach and 
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to extrapolate from past experience without attempting to set up a model. This may 
well be the more rewarding approach for ■hredirtinr, I'M tho chfown Blit it dOCS IlOt 


- -r A -£--antui|juilg tu bCL Up A 11 

well be the more rewarding approach for prediction in the short term. 
enable us to predict what would happen if we altered the system 


50.46 If it has been found that an autoregressive scheme or a scheme of regression 
satisfactorily fits past experience, there remains little to be said about the forecasting 
problem. We merely use the authenticated relationship to predict future values. This 
can be done for any form of time-series. If it has been decomposed into elements 
such as trend, seasonal, and oscillatory series, we predict the future of each element 
and reassemble them to forecast the future of the original series. As we remarked at 
an earlier stage, the underlying supposition is that the various elements are causally 
independent. 


50.47 In practice it is often found that schemes of order two are as satisfactory 
as such schemes can be, i.e. little is gained by adding extra terms. In fact, a good 
deal of attention has been given to the case where the scheme is of order one, namely 
is a Markoff scheme. The prediction equation is then very simple but possibly too 
simple. A heuristic approach suggested by Holt (1957) has some attractive features. 

We consider a scheme of autoregressive type, 

W (+1 Cf.{\ 0c) a(l . . . -fOc(l — £(+X‘ (50.136) 

Considered as a predictor this has a certain intuitive appeal if | 1 —a | < 1, for then 
the terms contribute less and less to u t+1 as we go back in time. If we estimate u t+1 
by the systematic component of (50.136), i.e. ignore e <+l , we have 


(50.137) 


Est Uf + x — Est= aS (1 — on) 5 u t — a2 (1 — aV’ 

o o 

= oc^ —Est u t ) — (l — a) k+1 u t _ k _x- 

For | 1 — a | not too close to unity and moderately large k we may write 

Esttt <+1 — Est u t = on(u t — EstWj) 

- a e t . (50.138) 

Suppose a known. At any time-point t +1 we know e t \ it was the error of estimate at 

time t. Thus we simply estimate u t+1 by taking the estimate at time t and adding a.e t . 


50.48 The estimation of the parameter a is not a simple matter. The most 
straightforward approach, given enough computational assistance, is to try a range of 
values of a and to calculate the sum 

2 {u t+1 -v.u t - . . . -u.(l-<x.) lc ii t _ k } 2 

for different values of k , selecting the values of a and k which minimize it. In practice 
it seems that one does not need great precision in the exact determination of optimal a. 

Systems of type (50.136) are known, for obvious reasons, as exponentially weighted 
moving-average predictors. They have been studied in more detail by Brown (1959), 
Barnard (1959), Cox (1961), Box and Jenkins (1962), and Ward (1963). Winters (I960) 
proposed an extension which includes seasonal movements. 


50.49 At this point we must end, realizing that there are some branches of the 
subject which might have been discussed at greater length and many more which 
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the ADVAi , stayed the course thus f„ 

e development. The shortcomings of our work. A 

remain for futu ^. ma i te allowances fo s hake down into a coherent str Uc t Ject 

we hope, be wil 1 J - dly as oU rs does no ^ the changes in emphasis are d. Ure ’ 
which is growing for r egret, in that so m Y , te d]y carry our subject to f^ Ue to 
Nor is this a matter f^ 8^.^ whic h will tind^ ^ writers on whose woT^ 

»* "“ y ;.*"- 1 “ d •• "«'< 
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EXERCISES 

50 1 If is the determinant of the matrix of (50.12), show that 

Vn=d+nVn-l-p 2 Vn-2 

and hence that 


Vn = (1 —/? 2W + 2 )/(l -/? 2 ). 


50.2 In the notation of 50.9 show that, for a Markoff scheme, 

w; = rj+2prj- x +p a rj- 2 , 

and hence that such a scheme is inadequate to represent the series of Example 50 2 


A ■ 


50.3 For i] of equation (50.26) show that if G(#) is the autocovariance function of s that f 
rj is ’ 11131 01 

( °° \ / c° \ 

,:-„**-**’) )<&) 

and hence that e and 17 have the same autocovariance generating function. 


' \ 


50.4 Verify equation (50.32). 

?" 61J 4 «— —. 
f < , giving a,,, a,,, . . ., a st anc j a([ _ show tlat ° n * 6 assum P td0n that it is of order 

“»<=«.-! t = \, I, 

' Pl+ ••• +a »-l,s-l/> s -l ’ 

1 = h 2, ..., k. 






(Durbin, 1960b) 
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50.6 In the notation of 50.10 show that if the likelihood is written as (1 — £ 2 ) 
«o that 


SO that 

j f(Q)da = (l-j3 2 )*, 

then 

*(8) ■ 0, "" > 




Hence show that approximately 


var b 


and derive equation (50.48) 

50.7 Verify equation (50.88). 


JwY / E ( d J9\ 

~ E \dp) / E \dp*J 


(Durbin, 1959b) 


•'v.f * — j i V' / ' 

50.8 If two series of linear autoregressive or moving-average type are generated from th 
same series of random elements, show that for all k 


00 oo 

£ P(ll)i P(22)Jc-i = £ P(l2)iP(21)Jc-i • 

i— — oo i = — oo 


00 

£ 

= — 00 


50.9 In Example 50.6, if the matrices F are the A-matrices of an autoregressive scheme, 
show that only one determines a process which is stationary. 

50.10 Generally in 50.26, by considering the case where B is the identity matrix, discuss 
the conditions under which an autoregressive scheme has an identifiable stationary solution. 


Envoi to Volume 3 


"Before your going down at the end of the Parliament, 1 
thought good to deliver unto you certain notes for your observa¬ 
tion, that serve aptly for the present time, to be imported after¬ 
wards when you shall come abroad.... 

" Yourselves can witness that I never entered into the examina- 
tion of any cause without advisement, carrying ever a single 
eye to justice and truth; for, though I were content to hear matters 
argued and debated pro and contra, as all princes must that will 
understand what is right, yet I look ever as it were upon a plain 
table wherein is written neither partiality nor prejudice.’* 


Elizabeth I, to her last Parliament 


APPENDIX TABLES 


1 The frequency function of the normal distribution 

2 The distribution function of the normal distribution 

3 Quantiles of the d.f. of 

4a The distribution function of % 2 for one degree of freedom, 

4b The distribution function of x 2 f°r one degree of freedom, 

5 Quantiles of the d.f. of t 

6 5 per cent points of z 

7 5 per cent points of F 

8 1 per cent points of z 

9 1 per cent points of F 

10 Symmetric functions. Augmented symmetries in terms of power-sums and 
vice versa 
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Appendix Table 2 
The table shows the area under 
e.g 


APPENDIX TABLES 

Distribution function of the normal distribution 


Deviate 

o-o + 

°'5 + 

1*0 + 

i*S + 

2*0 + 

2*5 + 

3*0 + 

3*5 

4 

r 

OOO 

5000 

6915 

8413 

9332 

9772 

9 a 379 

9*865 

9*77 

0*01 

5040 

6950 

8438 

9345 

9778 

92396 

9^809 


y /« 

0*02 

5080 

6985 

8461 

9357 

9783 

9*413 

9* 8 74 


9 7° 

OO3 

5120 

7019 

8485 

9370 

9788 

9 2 43° 

9 2 87o 


9*79 

0-04 

5160 

7054 

8508 

9382 

9793 

9*446 

9 2 88z 


9 00 

0-05 

5i99 

7088 

8531 

9394 

9798 

9*461 

9 2 886 


9 01 

rt 3Q r 

0*06 

5239 

7123 

8554 

9406 

9803 

9 a 477 

9 2 889 


9 oi 

0*07 

5279 

7157 

8577 

9418 

9808 

9*492 

9 2 893 


9 oz 

0-08 

53i9 

7190 

8599 

9429 

9812 

9*506 

9 2 897 


9 83 

~ 3Q~ 

0*09 

5359 

7224 

8621 

9441 

9817 

9*520 

9^900 


9 83 

0*10 

5398 

7257 

8643 

9452 

9821 

9 2 534 

9*03 


9 84 
^30 - 

O-II 

5438 

7291 

8665 

9463 

9826 

9 2 547 

9 3 o6 


9 85 

0*12 

5478 

7324 

8686 

9474 

9830 

9*560 

9 3 io 


9 85 

_ 30£ 

0*13 

5517 

7357 

8708 

9484 

9834 

9 2 573 

9 3 i3 


9 00 

30 

0*14 

5557 

7389 

8729 

9495 

9838 

9 2 S85 

9 3 i6 


9*80 

0*15 

5596 

7422 

8749 

9505 

9842 

9 2 59 8 

9 3 i8 


9*87 

0*16 

5636 

7454 

8770 

9515 

9846 

9*609 

9 s 2i 


9 87 

0*17 

5675 

7486 

8790 

9525 

9850 

9*621 

9 3 24 

1 

9 3 88 

0*18 

5714 

7517 

8810 

9535 

9854 

9*632 

9 3 26 


9 3 88 

0*19 

5753 

7549 

8830 

9545 

9857 

9 2 643 

9 3 29 


9 3 8g 

0*20 

5793 

7580 

8849 

9554 

9861 

9 2 653 

9*3i 


9 3 89 

0*21 

5832 

7611 

8869 

9564 

9864 

9*664 

9 3 34 


9 3 9° 

0*22 

5871 

7642 

8888 

9573 

9868 

9*674 

9 3 3 6 


9 S 9° 

023 

59io 

7673 

8907 

9582 

9871 

9*683 

9 3 38 


9 4 °4 

0*24 

5948 

7704 

8925 

9591 

9875 

9*693 

9 3 4° 


9 4 o8 

0-25 

5987 

7738 

8944 

9599 

9878 

9*702 

9 3 42 



9*12 

0*26 

6026 

7764 

8962 

9608 

9881 

9 2 7ii 

9 3 44 



9 4 i5 

0*27 

6064 

7794 

8980 

9616 

9884 

9*720 

9 3 46 



9 4 i8 

0*28 

6103 

7823 

8997 

9625 

9887 

9*728 

9*48 



9*22 

029 

6141 

7852 

9015 

9633 

9890 

9*736 

9 3 5° 


| 

9 4 25 

0*30 

6179 

7881 

9032 

9641 

9893 

9 2 744 

9 3 52 



9 4 28 

0*31 

6217 

7910 

9049 

9649 

9896 

9*752 

9 3 53 



9 4 3i 

0*32 

6255 

7939 

9066 

9656 

9898 

9*760 

9 3 55 



9 4 33 

o-33 

6293 

7967 

9082 

9664 

9901 

9*767 

9 3 57 



9 4 3 6 

°’34 

6331 

7995 

9099 

9671 

9904 

9 2 774 

9 3 58 



9 4 39 

o*3S 

6368 

8023 

9ii5 

9678 

9906 

9*781 

9 3 6o 



9 4 4i 

0*36 

6406 

8051 

9i3i 

9686 

9909 

9 2 788 

9 3 6i 


1 

9 4 43 

0*37 

6443 

8078 

9147 

9693 

9911 

9 2 795 

1 9 3 62 



9*46 

0*38 

6480 

8106 

9162 

9699 

9913 

9*801 

9*64 


1 

9 4 48 

0*39 

6517 

8133 

9177 

9706 

9916 

9*807 

9 3 6 S 



9 4 5° 

0*40 

6554 

8159 

9192 

9713 

9918 

9 2 8i3 

9 3 66 


9 4 52 

0*41 

6591 

8186 

9207 

9719 

9920 

9 2 8i9 

9 3 68 


9 4 54 

0*42 

6628 

8212 

9222 

9726 

9922 

9*825 

9 3 69 

i 

9 4 56 

o-43 

6664 

8238 

9236 

9732 

9925 

9*831 

9 3 7° 


9 4 58 

o*44 

6700 

8264 

9251 

9738 

9927 

9 2 836 

9 3 7i 


9 4 59 

o*45 

6736 

8289 

9265 

9744 

9929 

9*841 

9 3 72 


9 4 6i 

0*46 

6772 

831s 

9279 

9750 

9931 

9*846 

9 3 73 


9 4 63 

0*47 

6808 

8340 

9292 

9756 

9932 

9*851 

9 3 74 


9*64 

0*48 

6844 

8365 

9306 

9761 

9934 

9*856 

9 3 75 


o 4 66 

o*49 

6879 

8389 

I 

9319 

9767 

9936 

9*861 

9 3 76 


| 9*67 


Note Decimal points in the body of the table are omitted. Repeated o’s are indicated 
by powers, e.g. 9 3 7 i stands for 0*99971. 
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APPENDIX TABLES 

Appendix Table 4a Distribution function of x* for one degree of freedom 

X ! = 0 to x J = * by steps of 0-01 _ 


*• 

P 

A 

0 

i -ooooo 

7966 

0*01 

092034 

3280 

0*02 

0*88754 

2505 

0 03 

0*86249 

2X01 

OO4 

0*84148 

1842 

0*05 

0*82306 

1656 

0*06 

0*80650 

I5l6 

0*07 

0*79134 

I404 

o*o8 

0*77730 

1312 

0-09 

0*76418 

1235 

0*10 

0-75183 

H69 

0*11 

0*74014 

IIII 

0*12 

0*72903 

1060 

013 

071843 

IOIS 

0-14 

0-70828 

974 

015 

0-69854 

938 

o*i6 

0-68916 

905 

0*17 

o-68oxi 

874 

<>•18 

0-67137 

845 

0*19 

0-66292 

820 

0*20 

0-65472 

795 

0*21 

0-64677 

773 

0*22 

0-63904 

752 

023 

0-63152 

731 

0-24 

0-62421 

713 

0*25 

0-61708 

696 

0*26 

0-61012 

679 

0-27 

060333 

663 

0*28 

0-59670 

648 

0-29 

0-59022 

634 

0-30 

0-58388 

620 

0*31 

0-57768 

607 

0-32 

0-57161 

595 

°'33 

0-56566 

583 

o -34 

0-55983 

572 

0*35 

o-554i 1 

560 

0-36 

0-54851 

551 

0-37 

0-54300 

540 

0-38 

0-53760 

530 

0-39 

0-53230 

521 

0-40 

0-52709 

512 

0-41 

0-52197 

503 

0-42 

0-51694 

495 

o-43 

0-51199 

487 

0-44 

0-50712 

479 

o-45 

0-50233 

471 

0-46 

0-49762 

463 

0-47 

0-49299 

457 

0-48 

0-48842 

449 

3 '49 

0-48393 

443 

5-50 

0-47950 

436 


0-50 

o-si 

0-52 

o-S3 

o-S4 

°‘S5 

0-56 

o-S7 

0-58 

o-59 

o-6o 

o-6i 

0-62 

0-63 

0-64 

0-65 

o-66 

0-67 

o-68 

0-69 

0-70 

0-71 

0-72 

0-73 

o-74 

o-75 

0-76 

0-77 

0-78 

0-79 

o-8o 

o-8i 

0-82 

0-83 

0-84 

0-85 

o-86 

0-87 

o-88 

0-89 

0-90 

091 

092 

0-93 

094 

o-95 

096 

o *97 

0*98 

0-99 

1-00 


047950 

047514 

0-47084 
0 - 46661 
046243 

0-45832 

0-45426 

0-45026 

0-44631 

0-44242 

0-43858 

0-43479 

0-43105 

0-42736 

0-42371 

0-42011 

0 - 4 x 656 

0-41305 

0-40959 

0-40616 

0-40278 

0-39944 

0-39614 

0-39288 

0-38966 

0-38648 

0-38333 

0-38022 

0-377x4 

0-37410 

0-37109 

0-36812 

0-36518 

0-36227 

0-35940 

0-35655 

0-35374 

0-35096 

0-34820 

o-34548 

0-34278 

0-34011 

0-33747 

0-33486 

0-33228 

0-32972 

0-32719 

0-32468 

0-32220 

0-31974 

0-31731 


436 

430 

423 

418 

411 

406 

400 

395 

389 

384 

379 

374 

369 

365 

360 

355 

351 

346 

343 

338 

334 

330 

326 

322 

318 

315 

311 

308 

304 

301 

297 

294 

291 

287 

285 

281 

278 

276 

272 

270 

267 

264 

261 

258 

256 

253 

251 

248 

246 

243 

241 
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Tab* 4b TiTlO 


y3 for one degree of freedom 
by steps of 0*1 


i-o 

i*i 

1*3 

i*3 

1-4 

i'5 

i-6 


5*0 

5’i 

5‘2 

5'3 

>‘4 

?*5 


0-31731 

0-29427 

0-27332 

0-25421 

0-23672 

0-22067 

0-20590 


17 

0-19229 

i-8 

0-17971 

i*9 

0-16808 

2*0 

0-15730 

2-1 

0-14730 

2*2 

0-13801 

2-3 

0-12937 

2*4 

0-12134 

2*5 

0-11385 

2-6 

0-10686 

27 

0-10035 

2*8 

0-09426 

2*9 

0-08858 

3*0 

0-08326 

3*1 

007829 

3*2 

007364 

3*3 

0-06928 

3*4 

006520 

3*5 

0-06137 

3-6 

0-05778 

3*7 

0-05441 

3*8 

005125 

3’9 

0-04829 

4*° 

0-04550 

4*i 

0-04288 

4*2 

0-04042 

4*3 

0-03811 

4*4 

°*°3594 

4*5 

0-03389 

4*6 

0-03197 

4*7 

0-03016 

4*8 

0-02846 

4*9 

0-02686 


°‘° 2 535 

°-°2393 

o’02259 

0-02133 

0’020I4 

0-01902 


2304 

2095 

1911 

1749 

1605 

1477 

1361 

I25 8 

1163 

1078 

1000 

929 

864 

803 

749 

699 

651 

609 

568 

532 

497 

465 

436 

408 

383 

359 

337 

316 

296 

279 

262 

246 

231 

217 

205 

192 

181 

170 

160 

151 

M2 

134 

126 

U 9 

112 

I06 


5’6 

5’7 

5’8 

5’9 

6-o 

6-i 

6-2 

6-3 

6-4 

6-5 

6-6 

6-7 

6-8 

6- 9 

7- 0 
71 
7-2 
7’3 
7’4 
7’5 
7-6 

7’7 

7- 8 

7*9 

8- o 

8-i 

8-2 

8-3 

8-4 

8-5 

8-6 

8-7 

8-8 

8- 9 

9- 0 

9'i 

9*2 

9’3 

9*4 

9‘5 

9*6 

97 

9*8 

99 

io*o 


0-01902 

0-01796 

0-01697 

0-01603 

0-01514 

0-01431 

0-01352 

0-01278 

0-01207 

0-01141 

0-01079 

0-01020 

0-00964 

0-009I2 

0-00862 

0-008l5 

0-0077I 

0'00729 

0-00690 

0*00652 

0-006l7 

0-00584 

0-00552 

0'00522 

o-00494 

0-00468 

0-00443 

0-00419 

000396 

0-00375 

0*00355 

0*00336 

0*00318 

0*00301 

0-00285 

0-00270 

0*00256 

0*00242 

0*00229 

0*00217 

0-00205 

0*00195 

0*00184 

0*00174 

000165 

0*00157 



106 

99 

89 

83 

79 

74 

7i 

66 

62 

59 

56 

52 

50 

47 

44 

42 

39 

38 

35 

33 

32 

30 

28 

26 

25 

24 

23 

21 

20 

19 

18 

17 

16 

15 

14 

H 

13 

12 

12 

10 

II 

10 

9 

8 

8 
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Appendix Table 7 5 per cent points of the variance ratio F 
(values at which the d.f. = 0-95) 

(Reproduced from Sir Ronald Fisher and Dr F. Yates : Statistical Tables for Biological, 
Medical and Agricultural Research, Oliver and Boyd Ltd., Edinburgh, by kind permission 

of the authors and publishers) 
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(Exercise 40.6) 236. H su, P. L., latent roots, 258; multivariate Beta 

Hatanaka, M., spectrum analysts, 468, M ' distrjbution> 260. 

492. tables 87’ Hultquist, R. A., Model II AV, 59, 62, 63. 

Healy, M. J. R-> trans ^ a , Example 42.4) Hunter, J. S., response surfaces, 158. 

data on male premolars, ( P Hurwitz, W. N., sample surveys, 166; choice 

„ ?°~n r R estimation in unbalanced of selection probabilities, 202; non- 

Henderson, C. R-, est response in sample surveys, (Exercise 

Model II AV, 73. . , | an 71 I'ift 

Herbach, L. H., testing hypotheses in Model * ' 


Imhof, J. P., mixed model in AV, 77. 
Incidence matrix, of an experiment, 125. 
Independence, LR test of, (Exercises 41.10-11) 
261-2, 270-1, 281, (Exercise 42.4) 282; 
equivalent to testing equality of latent 
roots, 291. 

Index number, from component analysis, 295. 


II AV, 68, (Exercises 36.5-6) 82-3. 

Hess, I., formation of strata, 186; ratio estima¬ 
tion, 223. 

Hext, G., spectrum analysis, 467. 

Hierarchical classification, see Classification, 
hierarchical. 

Higham, J. A., trend fitting, (Exercise 45.6) 

400-1 mucx liuiiiucjl, iium buuipvuuu ana. 

Hills, M., discrimination, (Example 44.5) 329. Intensity, 411; see Spectrum, Time-series. 
Hodges, J. L., median estimators in AV, 110; Interactions in AV, 13, 36; Tukey’s test for, 
formation of strata, 185. ' (Example 35.3) 23; zero for any weights if 

Hoel, P. G., regression designs, 161; distribu- for one set, 26; independent and tied, 

tion of dispersion determinant, (Exercise 75-6; unit-treatment, 80. 

41.8) 261. Interactive errors, 80. 

Hogg, R. V., nested hypotheses, (Exercise Inter-block information, 146-51, (Exercises 
37.2) 114; testing degree of polynomial 38.14-16) 164. 
regression, (Exercise 37.3) 114; power of Inverse sampling, (Exercise 40.3) 235. 
tests, 281. Ito, K„ robustness of T 2 test, 281; power of 

Holt, C. C., exponential weighting, 501. 1L - r - ■ r ' n * n 

Homogeneity, LR tests of, 87, (Exercises 


' - 7 

tests for mean-vectors, 281-2. 

mogeneity, jujk. tests or, a/, (Exercises 

37.1- 3) 113-14, 264-9, (Example 42.1) James, A. T., latent 
270, (Example 42.2) 272-3, (Exercises Jeffers, J. N R see 

42.1- 3) 282. Tenlrin. a iu ** 

Hopkins, C. E., discrimination, 327, (Example 

44.5) 328-9. 

Horsnell, G., robustness of AV, 98. 

Horvitz, D. G., sampling with unequal 
probabilities, 173. 


roots, 260. 

... Freeman, G. H. 

Jenkins, G. M., joint distribution of serial 
correlations, 437, 449, (Exercises 48.18- 
48.19) 453; non-null distribution of serial 
correlations in Markoff case, 444, 445, 
(Exercise 48.17) 452; spectrum analysis, 
466; exponential weighting, 501. 
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John, S., discrimination* (Example 44.5) 329, 
(Exercise 44.2) 339. 

Johnson, A. H. L., square root transformations, 
(Exercise 37.15) 117. 

Johnson, N. L., sequential AV, 79; quota 
sampling, (Exercise 40.11) 237-8; variate- 
difference method, 391. 

Kelley, T. L., psychological data for canonical 
analysis, (Example 43.5) 303. 
Kempthorne, O., models in AV, 75; complete 
randomization, 81; expected mean squares 
in randomized blocks and Latin squares, 
138; BIB analyses, 146; PBIB analysis, 
153; lattice designs, 154; confounding, 
157; sequences of experiments, 158. 

Kendall, D. G., logarithmic transformation, 
92. 

Kendall, M. G., AV using ranks, (Exercise 
37.14) 116; M-dimensional geometry, 243 
f.n.; computation in component ana¬ 
lysis, 289; ranking for principal com¬ 
ponents, 295; factor analysis, 310; classi¬ 
fication, (Example 44.7) 338-9; central 
limit for moving average weights, 370 
f.n.; bias in serial correlations, 435, (Exer¬ 
cises 48.4, 48.11) 450-1; distribution of 
serial correlation in Markoff case, 444. 

Keuls, M., studentized range test in AV, 
45-6, (Exercise 35.10) 54. 

Khamis, S. H., sample survey theory, 170, 
(Exercise 39.1) 204. 

Khintchin, A., ergodic theorem, 407, 410. 

Kiefer, J., optimal experiments, 130; regression 
designs, 158. 

Kish, L., estimation of variance, (Exercise 
39.16) 207; ratio estimation, 223. 

Koop, J. C., linear estimation in sample 
surveys, 174. 

Koopmans, T. C., serial correlations, 442. 

Kruskal, J. B., monotone transformations, 88. 

Kshirsagar, A. 1V1., multivariate Beta distri¬ 
bution, 260; Bartlett decomposition, (Exer¬ 
cises 41.16-17) 262-3. 
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zero value, 292; testing equality of small 
roots, 292-3; large-sample results, 293—4; 
in discrimination, 326; see Canonical 
correlations, Component analysis, Factor 
analysis. 

Latin squares, 134—7, (Exercises 38.5—7) 162; 
robustness of normal theory, 138—9; 
factorial experiments in, 155. 

Lattice designs, 153-4. 

Lawley, D. N., distribution of LR statistic, 269; 

T 0 2 test, 281; component analysis, (Ex¬ 
ample 43.2) 290-1, 293; canonical correla¬ 
tions, 305, 306; factor analysis, 308, 310, 
(Exercise 43.11) 313. 

Least squares, in sampling without replacement, 

167-8; in discrimination, 323, (Exercises 
44.10-11) 341; for moving averages, 

366-7, 374-5; in autoregressive series, 

476, 499; see Linear model. 

Ledermann, W., component analysis, 285. 

Lehmann, E. L., median estimators in AV, 

110; multivariate tests, 281. 

Leipnik, R. B., serial correlation, 444, 445. 

Levene, H., tests of randomness, 354, 355, 
(Exercises 45.6-7) 364. 

Levine, A., polynomial regression designs, 161. 

Lewis, T., canonical analysis in educational 
research, 305. 

Likelihood Ratio (LR) tests, in Model XI AV, 
(Exercises 36.5-6) 82—3; for “nested” 
hypotheses, 87, (Exercises 37.1—3) 113—14; 
in multivariate analysis, 265-84; see 
Homogeneity, Independence, Regression, 
Sphericity. 

Linear model, AV in (Model I), 1—56; decom¬ 
position of non-central quadratic forms, 
2-5; removal of singularity, 12; choice of 
weights, 25-6; general disproportional 
frequencies case, 38; combination of 
tests, 40-3; multiple comparisons, 43-9; 
analysis of covariance, 49-52; extension 
of model to further parameters, 51-2; 
transformations to, 85-8; missing obser¬ 
vations, 111-13; for block experiments, 
125-9; in sampling without replacement, 


Lahiri, D. B., selection with unequal proba¬ 
bilities, (Exercise 39.11) 206-7; removal 
of bias in ratio estimators, 223. 

Latent roots of a dispersion matrix, null 
distribution, 255-8, 259, 260, (Exercise 
41.14) 262; (Example 42.4) 280-1; testing 
equality equivalent to testing independ¬ 
ence, 291, (Example 43.3) 292; testing 


167-8; multivariate, 273-6, (Example 

42.3) 277-80; see Analysis of Variance, 
Classification, one-way, etc. 

Linhart, H., discrimination, 327. 

Lipton, S., data on male premolars, (Example 

42.4) 280-1. 

Logarithmic transformations, (Examples 37.2, 

37.4) 91, 93; 95-6. 



Logit transforms ions 94^ normality of a 
Lomnicki, Z. •» /'Exercise 49.8) 470. 

, "fTLKto unbalanced Model 
Low, A> 

II AV, 73. 

Lu^htw^/diim 336. 

MahaJanobis, P. C. ( D 2 statistic and general^ 
distance, 259-60. 

Main effects in AV, 12, 36. /'Fva-nole 

Manley, G., meteorological data, (Examp 

43.4) 295. 

Mann, H. B., complete sets of Latin Squares, 
137; construction of BIB, 143; confoun 
ing, 157; difference-sign test, 357, (Exer¬ 
cise' 45.3) 363; rank correlation test, 



358; LS in autoregressive series, 


476. 


Markoff series, autocorrelations, (Example 47.2) 
405; backwards, (Example 47.3) 406; 
correlogram and spectrum, (Example 47.7) 
418-19; partial autocorrelations, 424-5; 
cumulants and normality, (Exercise 47.2) 
426; (Exercise 47.3) 427; grouping, 

(Exercise 47.15) 428; standard error of 
serial correlations, (Example 48.3) 432; 
covariance of serial correlations, (Example 
48.4) 433; bias in serial correlation, 
(Example 48.7) 435, (Exercises 48.4-5; 
48.8, 48.11) 450-1; to higher order, 435; 
non-null distribution of serial correla¬ 
tions, 443-4,447-9, (Exercises 48.13,48.17) 


INDEX 

Mikbail, N. N., power of tests f 

vectors, 281—2. ° r 

Mill, J. S., on experiments, 120. 

Mixed models, 77-9; for recovery 
block information, 146-51. 0t 
Models I, II; see Analysis of variant 
Mood, A. M., median tests, 109-11 

37.18-20) 117-18; latent roots, 2^ 
Moore, G. H., tests of randomness ‘ 

356. ’ 3 54, 

Moran, P. A. P., Slutzlcy sinusoidal limit a 
moments of serial correlations Ac > 
(Exercises 48.5-6, 48.12) 450-1.’ J5 "‘ 7 > 

Moving average, 367-402; as LS pol ynon . 
366-7; formulae to degree 5 ( 3 ^' al > 
formulae in terms of differences y?' 
Spencer’s 15- and 21-point fortn u i°’ 
(Examples 46.3-4) 372; end-effects, 373 ^?’ 
using orthogonal polynomials, 374 - 5 . ’ 

Moving average series, Seasonal variati^ 
Trend. ° n > 

Moving average series, 412-16; and auto 
regressive series, 417, 474-6; estimates 
and tests of fit, 481—4, (Exercise 50 (A 
503; as errors in autoregressive series 
484-6; exponentially weighted, 501- Se ’ e 
Autoregressive series, Time-series. 
Mudholkar, G. S., power functions of multi¬ 
variate tests, 281. 

Muller, E.-R., BIB designs, 143. 
Multinormal distribution, multivariate normal 
distribution; see Multivariate analysis. 
Multi-phase sampling, 228. 


k 


Multiple comparisons in AV, 43-9. 

451-2; effect of starting-point, 472-4; Multi-stage sampling, 189-204, (Exercises 39 24 
multivariate, 496; in forecasting, 501. 39.28-9) 208-9; estimator, 191; wit ^ 


Marriott, F. H. C., bias in serial correlations, 
435. 

Marsaglia, G., decomposition of quadratic 
forms in normal variables, 4. 

Mauchly, J. W., sphericity test, 271. 
Mauldon, J. G., multivariate estimation para¬ 
doxes, 281. 

Maxwell, A. E., component analysis, (Example 
43.2) 290-1; factor analysis, 308, 310 
(Exercise 43.11) 313. 

Mean, moments of, 168-9. 

Mean squares (MS), expected values of in 
Model II AV, 63, 69. 

Median tests in AV 108 11 /t? 

37.18-20) 117-18 ’ (Exerclses 

Mickey, M. R., unbiassed 
estimators, 219. 


equal probabilities, 191-4; with unequal 
probabilities, 195-7; estimation of vari¬ 
ance, 197-201, 223-4; cost function and 
minimum variance, 201-2; choice of 
probabilities, 202-4; efficiency, 204; strati¬ 
fication, 204; ratio and regression esti¬ 
mators, 223-4; domains of study, 232-4. 
Multivariate analysis, 239-341; in time -series, 
86-96; see Canonical variables, Com- 
ponent analysis, Correlation determinant, 
Discrimination and classification, Dis¬ 
persion matrix, Factor analysis, Gener¬ 
alized variance, Homogeneity, Hotelling 
, Independence, Latent roots, Regres- 

regression-type Murtdra p Phericit >’. Wishart distribution. 

ra, B., variate-difference method, (Exer- 
cises 48.15-16) 452. 
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Mwrthy, M. N. ( sample survey theory, 166; Partially balanced incomplete blocks (PBIB), 

sufficiency in surveys, 176, (Exercises 152 3 ‘ analysis, 466, 467, 

39.30-1) 209—10; unbiassed ratioestimators, Parzen, E., S P® C V^ 470 Y 

b:b, 


38.13) 163. 

Nair, K. R-, PBIB, 153. 

Nanjamma, N. S., unbiassed ratio estimators, 
223, (Exercise 40.1) 234. 

Narain, R. D., tests of independence unbiassed, 

271. 

Negative binomial distribution, angular trans¬ 
formation, 95, (Exercise 37.5) 114. 
Nerlove, M-, spectrum analysis, 467. 

oooi'-fir»a_ 


Pathak, P. A., "7^ ,nq 223 

theory, 170-1, (Exercise 39.30) 209, Hi, 

(Exercise 40.3) 235. 

Patterson, H. D., sampling on success 
occasions, (Exercises 40.9 1 ) • 

Pearce, S. C., review of non-orthogonal A , 

38 

Pearson, E. S., homogeneity tests, (Example 

42 2) 272. 

Periodogram, 411-12; see Spectrum, Time- 


Nerlove, M., spectrum analysis, 467. series. 10 c o 138-9, 151. 

Nested classification, 31 f.n.; tee Classifica- 353 - 5 , (Exercise 45.1) 


tion, hierarchical. 

Nested hypotheses, 87, (Exercises 37.1—3) 
113-14. . 


permuxauoii icsw, >- - . . c1 \ 

Phases, in time-series, 353-5, (Exercise 45^1) 
363; in harmonic analysis, 454, see T 
phase sampling, Multi-phase sampling. 


—u phase sampimg, iviuivi-p*^ov. 

~ - in AV - 


45-6, (Exercise 35.10) 54. 

Neyman,J., transformation bias, 95; stratified Q permutat ion test in AV, 

sampling, 180; two-phase sampling, 224. Pitman, f ^ P 3? 13) n6 . 

. . _i t ratio estimators, 106-7, (Exercise j/.io; 


Pillai, K. C. S., distribution of latent roots, 
259, 260. . _ 


xw, X- A . w ' 

Nieto de Pascual, J., unbiassed ratio estimators, 
213,215, 217,223. 

Noether, G. E., ranks test for BIB, 151; rank 
serial correlation test, 360. 

Non-central quadratic forms, decomposition 
of, 2-5. 

Normal distribution, logarithmic transforma- 


106-7, (Exercise 37.13) 116. 

Plackett, R. L., models in AY, 75, 81; dupli¬ 
cated observations in AV, 113. 

Poisson, distribution, square root transforma¬ 
tions, (Example 37.1) 89-90, (Exercises 
37.15-17) 117; sampling, (Exercises 39.13, 
39.23) 207-8. 


m „] distribution, logarithmic transforma- i'i.n) o. _ 

tion of sample variance, (Examples 37.2 Polynomial, tettrng degree in *egre*HOn, 
— it nr n't. __ vnnf trorsfnrmatinn of (Exercise 37.3) 114, regression designs, 


L1UU vn - , n - 

37.4) 91, 93; square root transformation of 
sample variance, (Example 37.5) 94; see also 
Bivariate normal, Multivariate analysis. 
Normal scores, transformation to, 94, AV 
using, 105, 107—8. 

Normalizing transformations, 93-4. 

_ . O T . » _1 'll 


ruiyiiuiuiai, -o- - . 

(Exercise 37.3) 114; regression designs, 
158-61 • 

Pooling procedures, in AV, (Example 36.6) 
68; in regression, (Exercise 37.3) 114. 
Pope, J. A., bias in serial correlations, 435. 
Posten, H. O., power of LR test, 282. 

_ _ • A r- A ^ 


Normalizing transformations, vo—r. ’ r . ‘ 1C1 _ 0 

Norton H. W., review of Latin squares, 137. Preference experiments, 151 2. 

Nuisance factors, 124; two, 132; three or more, Principal components, 287; see Componen 

135-7 ' ^Im¬ 

probabilities proportional to size (p.p.s.), 

< • . /i n r - rn A A __ T ^ m 


Ogawa, J., robustness of F-test in randomized 
blocks and BIB, 139, 151. 


auinuvo \r r-/* 

195-7, 204; see Unequal probabilities, 
Surveys. 


blocks and BIB, uv, ior• „ "- 

Olkin, I., multivariate ratio estimators, 216, 223. Probit transformations, 94. 
A___ C00 Glassification, one- Product estimator, (Exercis 


V/UVlll, -- ' 

One-way classification, see Classification, one¬ 
way. 

Orcutt, G. H., autocorrelated errors in re¬ 
gression, 497-8. 


r JLUU1L uanoJLUiinutiuiiU) x ». 

Product estimator, (Exercise 40.2) 234-5. 
Puri, M. L., median estimators in AV, 111. 

Quadratic forms, decomposition of, 2-5. 

x-x • 1_ _1 *_>100 


gression, 497-8. i^uaurauc lurms, ucuunipuaitiuii ui, 4 - 

Orthogonal squares, 136, (Exercises 38.6-7) Quasi-random sampling, 188. 

162; factorial experiments in, 155. Quenouille, M. H., sequences of experiments, 

158; method of bias-reduction, 216, 264, 

Paired comparisons, 152. 306, 435; variate-difference method, 393, 

Parker, R., Euler’s false conjecture, 136. (Exercises 46.7-11) 401; trend-fitting, 



INDEX 


» • aa 19) 402 - large- Recovery of inter-block information, iee , 

393-4, 396, j corre lations for block information S, 

sample theory of serial c ise48)4) Rees , D. H„ non-orthogonal additive 

autoregressive series, 4JJ, (, rnrrrla- way cross-classification, 38; dioivu 

M; non-null distribution of set,a correla ^ 259> ; distnb 

“ n iD Mdtai Regression, testing degree of po &4. 

(Exercise 48.13) 451 , J robugtness of (Exercise 37.3) 114; transform^ 1 ’ 

^/“^“^"“rheory.’W; unequal (Exercise 37.9) 11S; designs, 158-6 1: t' 



nf serial correianoiis, - t 

serial correlation theory, 449; unequal 
time-intervals in time-senes, , P 


autocorrelations"and test of fit in auto- 42.3) 277-80, (Exercises 42.15-16) 

" ZZ series, 478; multivariate time- jAjgegM errors, 497-8; ^ 

series 486-9, 492; series with common Autoregressive series, 

errors' (Exercise 50.8) 503. Regr “ S „ 10 ?, estnnators - 218-19; unbiased 

Quota sampling, (Exercise 40.11) 237-8. 219-22; m stotified and multi-stag^ 

sampling, 223-4; asymptotically li near 

Raj, D., sufficiency in surveys, 170, (Exercise 223-4; in two-phase sampling, 227-8. 
39.1) 204; unequal probabilities, 176, Rejective sampling, 174. 

(Exercises 39.9-10) 206; two-phase samp- Replacement, sampling with and without, 166* 
ling for probabilities, 228. see Surveys. 

Rajalakshman, D. V., multivariate time-series, Residuals, analysis of, 96-7; dispersion matrix 
496. of, 274, 275-6. 

Randomization, complete, 79; in experiments, Response surfaces, 158. 

120-5. Rhodes, E. C., trend-fitting, 393. 

Randomized blocks, 79-80, 130-2, (Exercises Robson, D. S., ratio estimators, 214, 223- 
38.3-4) 162; robustness of normal theory, product estimator, (Exercise 40.2) 235-6* 

138-9; factorial experiments in, 155. Robustness, of AV procedures, 97-108. 
Randomness, tests of, 360; see Difference- Romanovsky, V., Slutzky sinusoidal limit 415 
sign, Phases, Rank correlation, Records, Rosenblatt, M., spectrum analysis, 468- re- 
Serial correlation, Turning-points. gression with autocorrelated errors, 498. 

Range tests in AV, 44-6. Ross, A., allocation in stratified sampling 

Rank, transformations, 94; AV using, 105, (Exercise 39.20) 208; unbiassed ratio 
107-9, (Exercises 37.13-14) 116; test in estimators, 212. 

BIB, 151; correlation tests in time-series, Rotatable designs, 158. 

357-60; serial correlation test, 360, (Exer- Roy, J., inter-block information, 151. 
cise 45.5) 363. ~ Roy, S. N., mixed model in AV 79- distri- 

fe °i46 R pBH?atahSs’ imTV™ anaI y ses < button »t latent roots, 258, 259; Mahals- 
It. .7.?: ill’ dlscr *mination, nobis’s D‘ statistics, 259. 

Rao , V iT" 1 ’ 44 ; 4) 3 i 5 : 6 ;. . Rubin, H„ serial correlations, 442. 

Kao, J. N. K., unequal probabilities, 174; re¬ 
duction of bias in ratio estimator, 216; Samnford M R ;„, m /r , . 

random formation of strata, (Exercise 40 3) 235 ’ sam P Iln *. (Excise 

40.6) 236. * o , } 

Ratio estimators, biassedness 211-12 216-18 a ^P ® surveys, see Surveys. 

222- 3; consistency, 212 ; modified 212 - 13 ’ F - E -» approximate F-test in 

(Exercises 40.1-2) 234-5 (Exercises 40 13 ’ AV> ( Exercise 36 - 7 ) 83; random balance 
40.14) 238; variance comparisons,^ EM8; Scha STS"* 130 -. , , 

m stratified and multi-stage sampling vectors 282 ‘ S ° n meim ' 

223- 4; asymptotically linear 223-4- in tt' t.t* n? 2 * 

two-phase sampling, 227-8. ’ ’ e .^ e ’ Tukey’s test for additivity, 25; 

Realization, 404. * interactions zero for any weights if for 

Recognizable individuals, in sample sm-v ° ne s 61 ’ ^6; analysis of cross-classified 

theory, 166, 170-1, 174-5 Y ^ ata with empty cells, 30; three-way 

ecords test, 360, (Exercises 45.8-9) 364-5 hierarchical classification, 34; multiple 

corn P ar ^ sons > 46; simultaneous confidence 


errors, (Kxercise ou.oj juj. 

Quota sampling, (Exercise 40.11) 237-8. 

Rai, D., sufficiency in surveys, 170, (Exercise 


mean- 


in 

two-phase sampling, 227-8. 

Realization, 404. 

Recognizable^ individuals, in sample survey 
theory, 166, 170-1, 174-5. y 

Records test, 360, (Exercises 45.8-9) 364-5 


multivariate analysis, 273-6, (E Xn ’ ln 

A1 T) 077—80 ef>o AO 1c 4 v\ — _ Plfi 
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intervals for all contrasts, 48-9, (Exercises 
11-13) 54 - 5 ; analysis of covariance, 52; 
expected MS in AV, 69; Model II three- 
wav hierarchical classification, 71; con¬ 
fidence intervals in Model II AV, 71, 
(Exercise 36.15) 84; models for AV, 75 
mixed model, 75 77, 79; robustness of 
A V 98; AV of cell means, (Exercise 37.7) 
115* interaction in Latin squares, 135, 
robustness in randomized blocks and 
Latin squares, 138-9; problem of two 

means, 281. # 

o rer. L., transformation bias, 95. 

w. J-, robustness of T 2 test, 281. 
c *£ l s transformation bias, 95. 

Searle S. R.> estimation in unbalanced Model 

Seasonaf^riltion, 349-SO, 396-tOO, 403; and 
spectrum, 467-8; see Movmg average, 
Trend. 

Seber, G. A. F., orthogonality in AV, 37; 

power of multivariate tests, 281. 
Self-weighting designs, 195, 202. 

~ A r selection schemes with unequal 
’probabilities, (Exercises 39.5-6) 205-6. 
Sequential analysis of variance, 79. 

Serial correlation, using ranks, 360, (Exercise 
45.5) 363; generally, 361—2, (Exercises 
45.10-11) 365; and variances of differ¬ 
ences 391-2; and variate-difference 
method, 393, (Exercises 46.10-11) 401; 
large-sample theory, 431-3, (Exercises 
48.3, 48.9-10) 450-1; bias, 433-5; exact 
moments, 435-7, (Exercises 48.5-6, 48.12, 
48.18-19) 450-3; distribution in normal 
case, 437-^9, (Exercise 48.20) 453; trans¬ 
formations, 445, (Exercise 48.13) 451-2; 
see Autocorrelation. 

Seshadri, V., inter-block information, 150. 

Sethi, V. K., formation of strata, 186; un¬ 
biassed ratio estimators, 223. 

Shah, K. R., inter-block information, 151. 

Sharma, D., Tukey’s test for additivity, 
25. 

Shiskin, J., seasonal variation, 400. 

Shrikande, S. S., Euler’s false conjecture, 136; 
PBIB, 153. 

Silvey, S. D., power of tests, 281. 

Simaika, J. B., power of T 2 test, 281. 
Simultaneous test procedures, 44-9, (Exercises 
35.11-14, 35.16-17, 35.19) 54-6. 

Siotani, M., confidence intervals for contrasts in 

AV, 49. 


Sitgreaves, R., discrimination, (Exercise 44.2) 

339. 

Slater, P., discrimination data on neurotics, 

(Example 44.4) 325-6. .. it 

Slutzky-Yule effect, 378; sinusoidal limit 
theorem, (Example 47.6) 414-15. 

Smith, B. Babington, AV using ranks, (Exercise 
37.14) 116. ... „ 

Smith, C. A. B., quadratic discrimination, JZZ. 

Smith, K., polynomial regression designs, loi, 

(Exercise 38.19) 165. . , 

Snedecor, G. W., Yates’ method of weighted 
squares of means, 30. 

Solomon, H., cluster analysis, 337. 

Spectrum, spectral density, spectral function, 

410-11; as autocorrelation g.f., 410; ot 
Markoff and Yule series, (Examples 47.7-8) 

418-20; for continuous series, 422; effect 
of filtering, 423-4; analysis, 454-71; 
harmonic analysis, 454-5; Nyquist fre¬ 
quency, aliases, 455-7; effect of a harmonic 
component, 458-60; effect of other perio¬ 
dicities, trend, 460-1; test for the spectral 
ordinate, 461-2; smoothing, 463-4; cal¬ 
culation of, 464-6; estimation of density, 

466-7; and seasonal variation, 467-8; 
unequal time-intervals, 468-9; cross¬ 
spectra, 491-6; coherence, 491; poly¬ 
spectra, 492; see Time-series. 

Spencer’s 15- and 21-point formulae, (Ex¬ 
amples 46.3-4) 372. 

Sphericity test, 271-2, (Example 43.3) 

292. 

Split-plot designs, 157. 

Sprott, D. A., BIB designs, 143^1; recovery of 
inter-block information, (Exercise 38.16) 

164. 

Square root transformations, (Example 37.1) 
89-90, 95, (Exercises 37.15-17) 117. 

Srivastava, S. R., pooling procedures in 
Model II AV, (Example 36.6) 68. 

Stabilization of variance, 88-92. 

Stages, see Multi-stage sampling. 

Stationary time-series, 404; see Time-series. 

Step-by-step AV test procedures, 42-6, (Exer¬ 
cise 35.18) 56. 

Stevens, W. L., non-orthogonal three-way 
cross-classification, 38. 

Stratified sampling, 177-87, (Exercises 39.13— 
39.15, 39.17-21) 207-8; motivation for, 
177-9; choice of sample sizes, 180-2; 
strata and blocks, 182; MV allocation for 
fixed cost, 183; formation of strata, 
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in multi-stage ’ with two 

^ 23 , 2 1 ' 

40.15-16) 238; quota samplmg, 

(Exercise^45A) 36^rank correlation tests, 
360; rank serial correlation tests, 300, 

(Exercise 45.5) 363. . _ r 

“ Student,” (W. S. Gosset), LSD test in A , 

43. 

Studentized range tests in AV, ^L-6. 
Successive occasions, sampling on, (Exercises 


intensity, 411; periodogram, 4ll_ l2< 
ing average series, 412—16; autore 
series, 416-21; continuous 
filters and transfer functions, 4 23 ^ U 3; 
finite and circular processes, 425-fT/ ’ 
correlations, 360-2, 431-53; ST J Set Ul 
analysis, 410-11, 454-71; estimation^ 111 
testing in autoregressive and moving ^ 
age series, 472-86, 497-500; multiw^*' 
486-96; systems of equations, 496-7 - / ate ' 
casting, 500-2; see Autocorrelation, Am 6 ' 
regressive, Markoff, Moving average Ser j 
Randomness, tests of, Serial correlati* 68 ’ 

Spectrum, Trend, Variate-difference, Yu\’ 

series. 

Tin, M., ratio estimators, 217-18, (Exerciq 
40.13-14) 238. Ses 

Tintner, G., variate-difference method 390 
391. 



40.9-10) 236-7. , v ~ . . , 

iciency, in Model II AV, 62, 73, (Exercise Tocher, K. D., missing observations, li 2; 
00 . • _ th porv other sDOilt experiments. 113! hWu ’ 


Sufficiency, m - - — ■ j —* - - * » 

36.12) 83; in sample survey theory, 
170-1, 176, (Exercise 39.1) 204, (Exercises 
39.30-1) 209-10. 

Supplementary information, 211—38; see Ratio 
estimators. Regression estimators, Two- 
phase sampling. 

Surveys, compared with experiments, 119, 
182; theory, 166-238; random sampling 
without replacement, 167-8; moments of 
sample mean, 168-9; sufficiency, 170-1; 
see Domains of study, Multi-stage samp¬ 
ling, Ratio estimators, Regression esti- 


other spoilt experiments, 113; block ex¬ 
periments, 124, 140, (Exercises 38.2-3 
38.8-9) 162-3; inter-block information 
150, (Exercise 38.15) 164. ’ 

Transfer function, 423—4; see Time-series. 
Transformations, to the normal linear model 
85-8; purposes of, 87-8; monotone, 88- 
variance-stabilizing, 88-92; normalizing 
93-4; to additivity, 94-5; removal of bias! 
95-6; analysis of residuals, 96-7; see also 
Angular, Logarithmic, and Square Root 
transformations. 

mators, Stratified sampling, Two-phase Treatments, 124; AV for, 155. 
sampling, Unequal probabilities. Trend, 349-50, 366; tests against, 355, 360; 

Sweeny, H. C., polynomial regression designs, effect of elimination by moving averages 

161 • (Slutzky-Yule effect) 375-84, 393-6 

Systematic sampling, 187-8. (Exercise 45.12) 402; see Moving average! 

_ _ , . . Tryon, R. C., cluster analysis, 337. 

Tamura, R„ multivariate distribution-free Tschuprow, A. A., stratified sampling, 180. 
location tests, 282. Tukey, J. W., test for additivity, 25; multiple 


Taylor, L. R., transformation tables, 87. 

Technical errors, 80. 

Thompson, D. J., sampling with unequal 
probabilities, 173.. 

Thompson, W. A., Jr., negative estimates of 
variance in Model II AV, 71. 

Tidwell, P. W., transformations, 86, (Exer¬ 
cise 37.9) 115. 

Tied interactions, 75-6. 

Time-senes, 342-503; general, 342-8; com¬ 
ponents of, 349-50, 366; tests of random 
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combinations, 46—7, (Exercise 35.13) 55; 
expected MS in AV, 69; estimation in 
unbalanced Model II AV, 73; models in 
AV, 75, 77; moments of variance esti¬ 
mators in AV, (Exercise 36.10) 83; trans¬ 
formations, 90, (Exercise 37.4) 114; 
analysis of residuals, 96; spectrum analysis, 
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205 ' 7 V ( I 173-5; with replacement 176- ™ ijsm R „ Bartlett decomposition, (Ex 
estimation, l n 177 _ 9 ; and cluster- J § 41 17 _18) 263. 

Wffl =T'f,, — observation., 
chosen to , . 223-4; two-phase i10 

C ‘ ,OS oS,'gto e Stine,228Multi-stage “ g.. LR teat of mdeP'ndence of sen. 

sampling c ^ Qtified sampling, Surveys. f ariates (Exercises 41.10 ) 

271; homogeneity tests, (Example 42.2) 
272, (Exercise 42.7) 282. _ 

Williams, E. J., discrimination :Session-type 
Williams, W. H., unbiassed regression typ 

estimators, 219, 223. rvr.pration 

Wilson, K. B., evolutionary operation, 

Winters," P. R-, exponential weighting, 

501. 


" line to determine, 228; see 
sampling to samp ii n g, Surveys. 

Sampl ^otog fraction (USF), 180. 

Uniform samplmg 
Unit errors, 8 U. 


, P ranks test for BIB, 151. 

Van Elteren, ■> of var i an ce. 

Variance, see transformations, 88-92. 

Variance-stabilizing 384-93, (Exercises 

Variate-diffcrencc ^^8.15-16) 452. 

46 - 7 -‘V m ’w! proof of Scheffe’s all- 
Verh wn£ists method, (Exercise 35.19) 56. 

. p ra tio estimators, 223. 
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Tir 1 P latent roots distributions, 259. 
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339- rank serial correlation test, 360, LS 
in autoregressive series, 476. 
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Walker, A. M., autoregressive and moving a ’ utoregressiv e series, 418, (Exercise 47.6) 

average schemes 484. se ries, 427; causal models 499- 

Wolfowitz, J., regression designs, 101, pn 

test, 354; rank serial correlation test, 360. 
Working, H., grouping m Markoff senes, 
(Exercise 47.15) 428. 


average schemes, . . 

Walker G , equations for autoregressive senes, 

4 7 t«t in harmonic analysrs, 461. 

Wallis W. A., tests of randomness, 354 356. 

WdO H„ exponential weighting 50E 
’ r ’ G s. robustness of AV, 98, (Exer 

3 cises 37.10-12) US-16; pint d^bution metho d of weighted squares of means 

of serial correlations, 449; regression with Y ^ t ’ wo . way c i assi fication, 301, (Exercises 
autocorrelated e>rrors» 497. 35.5-7) 53 ; missing observations 111, j 


autocorrelated errors, -r^/. 

Weeks D. L., inter-block information, 150. 
Weights, choice of, in AV for linear model, 

Welch! B. L., robustness of AV, 103, 139. 
White, J. S., bias in serial correlation, , 
moments of serial correlation in Markott 

case, 445. 


s F., method ol weigntea squuico -x -vu... 
for two-way classification, 301, (Exercises 
35 5-7) 53; missing observations, 111, lid i 
BIB designs, 142; inter-block information, 
148; lattice designs, 153; confounding, 15 , 
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probabilities, 173; systematic sampling, 
188; estimation of variance in multi-stage 
sampling, 199, 201, 204; efficiency of 
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